Data analysis trend 2015 2016 v071

26
Page 0 * Strictly Confidential Big Data Analytics as a Service 데이터분석 시장의 진화 , 트랜드 읽기 2015. 12. 10 Chunmk

Transcript of Data analysis trend 2015 2016 v071

Page 1: Data analysis trend 2015 2016 v071

Page 0

* Strictly Confidential

Big Data Analytics as a Service

데이터분석시장의진화, 트랜드읽기

2015. 12. 10Chunmk

Page 2: Data analysis trend 2015 2016 v071

Page 1Page 1

10초 이야기

Page 3: Data analysis trend 2015 2016 v071

Page 2Page 2

10초 이야기

Data analysis is rooted in statistics,

which has a pretty long history.

It is said that the beginning of statistics was

marked in ancient Egypt,

when Egypt was taking a periodic census

for building pyramids.

Throughout history, statistics has played an important role for governments

all across the world, for the creation of censuses, which were used for

various governmental planning activities (including, of course, taxation).

Page 4: Data analysis trend 2015 2016 v071

Page 3

Contents

I. Evolution of Data Analysis

II. Data Analysis System – 3 Pillar

III. Big Data, Open Source, Cloud Computing, Data Analysis

IV. New Era – Data Analysis, Chaos

V. Data Consumer’s Needs

VI. New Trend– Citizen Data Scientist/ Smart Data Discovery

Page 5: Data analysis trend 2015 2016 v071

Page 4Page 4

데이터 분석 시스템의 진화에 대한 이해

데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해

Database Management Technology

Development of Business Intelligence & Analytic Platform

Technologies and Packages for Statistical Processing

Flat File Based

Tape based storage/Batch reporting

Query Modules & Report Generators

Batch querying & reporting/reporting

generators

Niche Statistical Subroutines

Social science/clinical trials/agriculture

Routinization

Querying & Reporting

Statistical Computation

Navigational DBMS

Late 1970 RDBMS emerged

Early DSS Tools

Commercial tools for building DSS

Statistical Software

Pharma & Social Scince

SPSS/SAS incorporated

Modularization

Decision Support & Modeling

1st Gen Statistical Processing

Relational DBMS

RDBMS solutions matured/personal databases for PC

DSS & 4GL Environments

4GL/EIS/spreadsheet/descriptive analytics

PC-based Statistical Packages

Other industries

Pc-based, graphics/Expert systems

Abstraction

Analytical Processing

2nd Gen Statistical Processing

Distributed DBMS

Distributed architecture(clustering)

Data Warehouse & BI

BI tool market grew rapidly/Web based

analytics

Early Data Mining tools

Vendors & solutions

Scaling & Distribution

Enterprise Performance Management

Data Mining

1960s 1970s 1980s 1990s 2000s

Post Relational DBMS

Unstructured data, non-relational data model/ large

scale distributed data

Data Processing & Analytic Platform

Large scale data processing/unstructured,real-

time analytics/ big data analytics

Data Processing & analytics Platforms

Open source R based statistical platforms/NLP

Text analysis

Specialization & Extension

Next Gen Data Processing

Next Gen Data Processing

AIhyped

MLstarted

new MLinvented

* Max Kanaskar’s “BIG DATA TECHNOLOGY SERIES”에서자료정리

Page 6: Data analysis trend 2015 2016 v071

Page 5Page 5

데이터 분석 시스템의 진화에 대한 이해

데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해

Technologies and Packages for Statistical Processing

Page 7: Data analysis trend 2015 2016 v071

Page 6Page 6

데이터 분석 시스템의 진화에 대한 이해

데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해

Database Management Technology

Page 8: Data analysis trend 2015 2016 v071

Page 7Page 7

데이터 분석 시스템의 진화에 대한 이해

데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해

Development of Business Intelligence & Analytic Platform

Page 9: Data analysis trend 2015 2016 v071

Page 8Page 8

데이터 분석 시스템 – 3개의 기둥1. 빅데이터의 등장 – 5V로 특징 지워지는 최근의 정의

Prescriptive

Predictive

Decisions

Recommend

Findings

Objectives

small big

few many

Data Object Size

Data Object Quantity

VOLUME

VALU

E

Data Sourcesfew many

Contents Typesfew many

Structure Typesstructured unstructured

Semantic Divirsity

low high

VARIETY

slow fastAcquisition

RateVELOCITY

Update Rateslow fast

Known Data Sources Provenance Data Integrity Governance

VERACITY

* NIST, 2014

too big (volume), arrives too fast (velocity), changes too fast (variability), contains too much noise (veracity), too diverse (variety)

to be processed within a local computing structure using traditional approaches and techniques

* ISO, 2014

Page 10: Data analysis trend 2015 2016 v071

Page 9Page 9

데이터 분석 시스템 – 3개의 기둥더 이상 떠오르는 신기술이아닌 빅데이터

2015.8 가트너의 Hype Cycle에서 빅데이터가 빠짐

Machine Learning, Citizen Data Science*가 새로 등장(데이터 분석과 관련한 새로운 트랜드가 빅데이터를 대체)

* people on the business side that may have some data skills,

possibly from a math or even social science degree

Big Data 2014년에여기에위치

Page 11: Data analysis trend 2015 2016 v071

Page 10Page 10

데이터 분석 시스템 - 3개의 기둥2. 클라우드 환경으로의 변화

ask previously un-askable questions is the emerging power of the cloud

Cloud computing is a transformative force addressing size, speed, and scale, with a low cost of entry and very high potential benefits.

large-scale image processing, sensor data correlation, social network analysis, encryption/decryption, data mining, simulations, and pattern recognition

*출처 : Booz Allen Hamilton

Page 12: Data analysis trend 2015 2016 v071

Page 11Page 11

데이터 분석 시스템 - 3개의 기둥Massive Data Analytics and the Cloud

HDFS Commercial hardware

resilienceelasticityscalability

Multi tenancy

Virtualization

Data Cloud Utility Cloud

Computing architecture for large-scale data

processing and analytics

Designed to operate at trillions of operations/day,

petabytes of storage

Designed for performance, scale, and data

processing

Characterized by run-time data models and

simplified development models

Computing services for outsourced IT operations

Concurrent, independent, multi-tenant user

population

Service offerings such as SaaS, PaaS, and IaaS

Characterized by data segmentation, hosted

applications, low cost of ownership, and

elasticity

*출처 : Booz Allen Hamilton

Page 13: Data analysis trend 2015 2016 v071

Page 12Page 12

데이터 분석 시스템 - 3개의 기둥Cloud based “as a Service” 의 다양한 모델

Data Analytics as a Service

Database as a ServiceStorage as a Service

Backup as a Service …

Insights as a Service

Page 14: Data analysis trend 2015 2016 v071

Page 13Page 13

데이터 분석 시스템 - 3개의 기둥오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴

Traditional(proprietary sw)

Page 15: Data analysis trend 2015 2016 v071

Page 14Page 14

데이터 분석 시스템 - 3개의 기둥오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴

Big Data Analysis Platforms and Tools

Hadoop, MapReduce, GridGain, HPCC, Storm

Databases/Data Warehouses

CouchDB, OrientDB, Terrastore, FlockDB, Hibari, Riak, Hypertable, BigData, Hive, InfoBright, Community, Edition, Infinispan, Redis, Cassandra, HBase, MongoDB, Neo4j

Business IntelligenceTalend, Jaspersoft, Palo BI Suite/Jedox, Pentaho, SpagoBI, KNIME, BIRT/Actuate

Data MiningRapidMiner/RapidAnalytics, Mahout, Orange, Weka, jHepWork, KEEL, SPMF, Rattle, Gluster, Hadoop Distributed File System

Programming Languages Pig/Pig Latin, R, ECL

Big Data Search Lucene, Solr

Data Aggregation and Transfer

Sqoop, Flume, Chukwa

Miscellaneous Big Data Tools

Terracotta, Avro, Oozie, Zookeeper

분야 오픈소스 소프트웨어(50)

아파치재단의프로젝트10월현재약230 여개 데이터분석및빅데이터관련오픈소스소프트웨어의종류

Page 16: Data analysis trend 2015 2016 v071

Page 15Page 15

Big Data, Open Source, Cloud Computing, Data analysis The combination of ‘data analysis’ and 'big data-open source-cloud computing' opens up a new universe of opportunities at many levels and in many places.

Traditional Data Analysis Data Analysis New Era

Big Data processing

Slow processingMassive/fast/distributed

processing

Computing PowerScale Up

on premiseScale Out

Off Premise(Cloud)

S/W proprietary s/w Open source s/w

Data structured dataStructured & unstructured data

Graph data

Analysis statistical analysisML, data mining, Network analysis, text mining, etc.

Value limited value & insightQuick & fast discover

knowledge, value

Page 17: Data analysis trend 2015 2016 v071

Page 16Page 16

New Era – Data Analysis, Chaos SaaS 전문 기업, 전통적인 데이터 분석 기업, BI 기업 등 다양한 기업들의 각기둥장

전문 업체 - 단순한 분석 및 시각화에 초점

대용량의 데이터 분석 보다는 경량 데이터 분석에 치중

SaaS 형태의 서비스

2,3 곳을 제외하고 다양한 분석기법을 적용하지 않음

사용자 중심의 UI/UX

MS/IBM/Amazon에 주목하여 3개의 서비스 별도 비교

10개의 SaaS 업체조사결과

Page 18: Data analysis trend 2015 2016 v071

Page 17Page 17

New Era – Data Analysis, Chaos Cloud Machine Learning으로 빅데이터 분석 시장에서 새로운 경쟁이 심화

- IBM Watson Analytics, Microsoft Azure ML, Amazon ML 비교

IBM Watson Analytics

• Decision Tree

• Classification

• Correlation

Anomaly Detection 2개/Classification 14개/Clustering 1개/Regression 8개/Feature selection 3개/Evaluate 3개/Score 4개/Train 4개/Statistical function 7개/Text Analytics 4개

Binary classification (predicting one of two possible outcomes)/ Multiclass classification (predicting one of more than two outcomes/ Regression(predicting a numeric value)

• couldn't handle enterprise scale data

• focused more on data visualization and exploration

• use natural language(plain English questions )

• automates some tasks

• user-friendly, GUI

• requires knowledge of the characteristics of machine learning algorithms

• targeted to developers, data scientists and very advanced business users

• narrower in scope

• data acquisition is effortless

• No infrastructure management required

• Does not require data science expertise

Microsoft Azure ML

Amazon ML

알고리즘 특징

쉬운사용자환경제공에노력( GUI / Data Scientist 가필요없는) 아직은빅데이터처리에미흡

주요특징

Page 19: Data analysis trend 2015 2016 v071

Page 18Page 18

Data Consumer’s Needs 경제적인 비용으로 시스템을 확장할 수 있는 환경을 갖고 언제 어디서나 쉽게 접속하여 다양하고 방대한 데이터를 취급하여 인사이트를 발견하고 실행할 수 있는 데이터 분석 시스템에 대한 요구

Data Consumer Group

C-level

Lob user

Data scientist

Data engineer

360 DegreeCustomer view

understand the market

find new market

personalizedwebsite/offering

improve service

co-create & innovate

reduce risk/fraud

better organizecompany

Understand competition

customers product organization

Data Analysis Use Case Framework

accebility

Easy to use

Elastic sharing

security

scalability

Cost effective

C-level ; CEO,COO,CIO,CTO,CMO…

LoB ; Line of Business

Page 20: Data analysis trend 2015 2016 v071

Page 19Page 19

New Trend

The Rise of the Citizen Data Scientist

Gartner defines a "citizen data scientist"

¹ At the end of 2007 classic, Competing on Analytics,

Tom Davenport predicted the rise of “analytical

amateurs,”

line of business

Not a trained data

scientist or developerFocused on

business problems

Driven to pull

togather the right data,

now

Iterative workflow -

one question leads to the

next

creates or generates models

not typically a member of an

analytics

Citizen

Data ScientistAlexander Linden, Research Director at Gartner,

predicts that through 2017, the number of

“Citizen Data Scientists,” i.e. analytical amateurs¹, will grow five times faster than the

number of highly skilled Data Scientists.

Page 21: Data analysis trend 2015 2016 v071

Page 20Page 20

New Trend

5-10%

Analytical Professionals— Can create algorithms

Analytical Semi-Professionals— Can use visual tools, create

simple models

Analytical Amateurs— Can use spreadsheets

15-20%

70-80%Competing on Analytics, Tom Davenport

¹ At the end of 2007 classic, Competing on Analytics, Tom Davenport

predicted the rise of “analytical amateurs,”

Page 22: Data analysis trend 2015 2016 v071

Page 21Page 21

New Trend

Algorithm Marketplaces Are Bringing the App Economy

to Analytics

Source: Gartner (October 2015)

Page 23: Data analysis trend 2015 2016 v071

Page 22Page 22

New Trend

Easier-to-use analytics tools : Smart data discovery

“Smart data discovery is a next-generation data discovery capability that provides insights

from advanced analytics to business users or citizen data scientists without requiring them to

have traditional data scientist expertise.”

Source: Gartner (June 2015)

Page 24: Data analysis trend 2015 2016 v071

Page 23Page 23

New Trend

Current Data Discovery Analytics

Workflow

Emerging Smart Data Discovery Analytics

Workflow

Source: Gartner (June 2015)

Easier-to-use analytics tools : Smart data discovery

Page 25: Data analysis trend 2015 2016 v071

Page 24Page 24

Business

User

New Trend

Algorithms

DAaaS functional elements

Smart Data Discovery

“ ~ make new sources of information accessible, consumable and meaningful to organizations of all sizes, even ones that don't have extensive advanced analytics skills or in-house resources.”

Citizen

Data

Scientist

이자료는매월계속업데이트될예정입니다.

Page 26: Data analysis trend 2015 2016 v071

Page 25

[email protected]://www.facebook.com/chun.myungkyu

감사합니다.