Data analysis trend 2015 2016 v071
-
Upload
chun-myung-kyu -
Category
Data & Analytics
-
view
1.157 -
download
3
Transcript of Data analysis trend 2015 2016 v071
Page 0
* Strictly Confidential
Big Data Analytics as a Service
데이터분석시장의진화, 트랜드읽기
2015. 12. 10Chunmk
Page 1Page 1
10초 이야기
Page 2Page 2
10초 이야기
Data analysis is rooted in statistics,
which has a pretty long history.
It is said that the beginning of statistics was
marked in ancient Egypt,
when Egypt was taking a periodic census
for building pyramids.
Throughout history, statistics has played an important role for governments
all across the world, for the creation of censuses, which were used for
various governmental planning activities (including, of course, taxation).
Page 3
Contents
I. Evolution of Data Analysis
II. Data Analysis System – 3 Pillar
III. Big Data, Open Source, Cloud Computing, Data Analysis
IV. New Era – Data Analysis, Chaos
V. Data Consumer’s Needs
VI. New Trend– Citizen Data Scientist/ Smart Data Discovery
Page 4Page 4
데이터 분석 시스템의 진화에 대한 이해
데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해
Database Management Technology
Development of Business Intelligence & Analytic Platform
Technologies and Packages for Statistical Processing
Flat File Based
Tape based storage/Batch reporting
Query Modules & Report Generators
Batch querying & reporting/reporting
generators
Niche Statistical Subroutines
Social science/clinical trials/agriculture
Routinization
Querying & Reporting
Statistical Computation
Navigational DBMS
Late 1970 RDBMS emerged
Early DSS Tools
Commercial tools for building DSS
Statistical Software
Pharma & Social Scince
SPSS/SAS incorporated
Modularization
Decision Support & Modeling
1st Gen Statistical Processing
Relational DBMS
RDBMS solutions matured/personal databases for PC
DSS & 4GL Environments
4GL/EIS/spreadsheet/descriptive analytics
PC-based Statistical Packages
Other industries
Pc-based, graphics/Expert systems
Abstraction
Analytical Processing
2nd Gen Statistical Processing
Distributed DBMS
Distributed architecture(clustering)
Data Warehouse & BI
BI tool market grew rapidly/Web based
analytics
Early Data Mining tools
Vendors & solutions
Scaling & Distribution
Enterprise Performance Management
Data Mining
1960s 1970s 1980s 1990s 2000s
Post Relational DBMS
Unstructured data, non-relational data model/ large
scale distributed data
Data Processing & Analytic Platform
Large scale data processing/unstructured,real-
time analytics/ big data analytics
Data Processing & analytics Platforms
Open source R based statistical platforms/NLP
Text analysis
Specialization & Extension
Next Gen Data Processing
Next Gen Data Processing
AIhyped
MLstarted
new MLinvented
* Max Kanaskar’s “BIG DATA TECHNOLOGY SERIES”에서자료정리
Page 5Page 5
데이터 분석 시스템의 진화에 대한 이해
데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해
Technologies and Packages for Statistical Processing
Page 6Page 6
데이터 분석 시스템의 진화에 대한 이해
데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해
Database Management Technology
Page 7Page 7
데이터 분석 시스템의 진화에 대한 이해
데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해
Development of Business Intelligence & Analytic Platform
Page 8Page 8
데이터 분석 시스템 – 3개의 기둥1. 빅데이터의 등장 – 5V로 특징 지워지는 최근의 정의
Prescriptive
Predictive
Decisions
Recommend
Findings
Objectives
small big
few many
Data Object Size
Data Object Quantity
VOLUME
VALU
E
Data Sourcesfew many
Contents Typesfew many
Structure Typesstructured unstructured
Semantic Divirsity
low high
VARIETY
slow fastAcquisition
RateVELOCITY
Update Rateslow fast
Known Data Sources Provenance Data Integrity Governance
VERACITY
* NIST, 2014
too big (volume), arrives too fast (velocity), changes too fast (variability), contains too much noise (veracity), too diverse (variety)
to be processed within a local computing structure using traditional approaches and techniques
* ISO, 2014
Page 9Page 9
데이터 분석 시스템 – 3개의 기둥더 이상 떠오르는 신기술이아닌 빅데이터
2015.8 가트너의 Hype Cycle에서 빅데이터가 빠짐
Machine Learning, Citizen Data Science*가 새로 등장(데이터 분석과 관련한 새로운 트랜드가 빅데이터를 대체)
* people on the business side that may have some data skills,
possibly from a math or even social science degree
Big Data 2014년에여기에위치
Page 10Page 10
데이터 분석 시스템 - 3개의 기둥2. 클라우드 환경으로의 변화
ask previously un-askable questions is the emerging power of the cloud
Cloud computing is a transformative force addressing size, speed, and scale, with a low cost of entry and very high potential benefits.
large-scale image processing, sensor data correlation, social network analysis, encryption/decryption, data mining, simulations, and pattern recognition
*출처 : Booz Allen Hamilton
Page 11Page 11
데이터 분석 시스템 - 3개의 기둥Massive Data Analytics and the Cloud
HDFS Commercial hardware
resilienceelasticityscalability
Multi tenancy
Virtualization
Data Cloud Utility Cloud
Computing architecture for large-scale data
processing and analytics
Designed to operate at trillions of operations/day,
petabytes of storage
Designed for performance, scale, and data
processing
Characterized by run-time data models and
simplified development models
Computing services for outsourced IT operations
Concurrent, independent, multi-tenant user
population
Service offerings such as SaaS, PaaS, and IaaS
Characterized by data segmentation, hosted
applications, low cost of ownership, and
elasticity
*출처 : Booz Allen Hamilton
Page 12Page 12
데이터 분석 시스템 - 3개의 기둥Cloud based “as a Service” 의 다양한 모델
Data Analytics as a Service
Database as a ServiceStorage as a Service
Backup as a Service …
Insights as a Service
Page 13Page 13
데이터 분석 시스템 - 3개의 기둥오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴
Traditional(proprietary sw)
Page 14Page 14
데이터 분석 시스템 - 3개의 기둥오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴
Big Data Analysis Platforms and Tools
Hadoop, MapReduce, GridGain, HPCC, Storm
Databases/Data Warehouses
CouchDB, OrientDB, Terrastore, FlockDB, Hibari, Riak, Hypertable, BigData, Hive, InfoBright, Community, Edition, Infinispan, Redis, Cassandra, HBase, MongoDB, Neo4j
Business IntelligenceTalend, Jaspersoft, Palo BI Suite/Jedox, Pentaho, SpagoBI, KNIME, BIRT/Actuate
Data MiningRapidMiner/RapidAnalytics, Mahout, Orange, Weka, jHepWork, KEEL, SPMF, Rattle, Gluster, Hadoop Distributed File System
Programming Languages Pig/Pig Latin, R, ECL
Big Data Search Lucene, Solr
Data Aggregation and Transfer
Sqoop, Flume, Chukwa
Miscellaneous Big Data Tools
Terracotta, Avro, Oozie, Zookeeper
분야 오픈소스 소프트웨어(50)
아파치재단의프로젝트10월현재약230 여개 데이터분석및빅데이터관련오픈소스소프트웨어의종류
Page 15Page 15
Big Data, Open Source, Cloud Computing, Data analysis The combination of ‘data analysis’ and 'big data-open source-cloud computing' opens up a new universe of opportunities at many levels and in many places.
Traditional Data Analysis Data Analysis New Era
Big Data processing
Slow processingMassive/fast/distributed
processing
Computing PowerScale Up
on premiseScale Out
Off Premise(Cloud)
S/W proprietary s/w Open source s/w
Data structured dataStructured & unstructured data
Graph data
Analysis statistical analysisML, data mining, Network analysis, text mining, etc.
Value limited value & insightQuick & fast discover
knowledge, value
Page 16Page 16
New Era – Data Analysis, Chaos SaaS 전문 기업, 전통적인 데이터 분석 기업, BI 기업 등 다양한 기업들의 각기둥장
전문 업체 - 단순한 분석 및 시각화에 초점
대용량의 데이터 분석 보다는 경량 데이터 분석에 치중
SaaS 형태의 서비스
2,3 곳을 제외하고 다양한 분석기법을 적용하지 않음
사용자 중심의 UI/UX
MS/IBM/Amazon에 주목하여 3개의 서비스 별도 비교
10개의 SaaS 업체조사결과
Page 17Page 17
New Era – Data Analysis, Chaos Cloud Machine Learning으로 빅데이터 분석 시장에서 새로운 경쟁이 심화
- IBM Watson Analytics, Microsoft Azure ML, Amazon ML 비교
IBM Watson Analytics
• Decision Tree
• Classification
• Correlation
Anomaly Detection 2개/Classification 14개/Clustering 1개/Regression 8개/Feature selection 3개/Evaluate 3개/Score 4개/Train 4개/Statistical function 7개/Text Analytics 4개
Binary classification (predicting one of two possible outcomes)/ Multiclass classification (predicting one of more than two outcomes/ Regression(predicting a numeric value)
• couldn't handle enterprise scale data
• focused more on data visualization and exploration
• use natural language(plain English questions )
• automates some tasks
• user-friendly, GUI
• requires knowledge of the characteristics of machine learning algorithms
• targeted to developers, data scientists and very advanced business users
• narrower in scope
• data acquisition is effortless
• No infrastructure management required
• Does not require data science expertise
Microsoft Azure ML
Amazon ML
알고리즘 특징
쉬운사용자환경제공에노력( GUI / Data Scientist 가필요없는) 아직은빅데이터처리에미흡
주요특징
Page 18Page 18
Data Consumer’s Needs 경제적인 비용으로 시스템을 확장할 수 있는 환경을 갖고 언제 어디서나 쉽게 접속하여 다양하고 방대한 데이터를 취급하여 인사이트를 발견하고 실행할 수 있는 데이터 분석 시스템에 대한 요구
Data Consumer Group
C-level
Lob user
Data scientist
Data engineer
360 DegreeCustomer view
understand the market
find new market
personalizedwebsite/offering
improve service
co-create & innovate
reduce risk/fraud
better organizecompany
Understand competition
customers product organization
Data Analysis Use Case Framework
accebility
Easy to use
Elastic sharing
security
scalability
Cost effective
C-level ; CEO,COO,CIO,CTO,CMO…
LoB ; Line of Business
Page 19Page 19
New Trend
The Rise of the Citizen Data Scientist
Gartner defines a "citizen data scientist"
¹ At the end of 2007 classic, Competing on Analytics,
Tom Davenport predicted the rise of “analytical
amateurs,”
line of business
Not a trained data
scientist or developerFocused on
business problems
Driven to pull
togather the right data,
now
Iterative workflow -
one question leads to the
next
creates or generates models
not typically a member of an
analytics
Citizen
Data ScientistAlexander Linden, Research Director at Gartner,
predicts that through 2017, the number of
“Citizen Data Scientists,” i.e. analytical amateurs¹, will grow five times faster than the
number of highly skilled Data Scientists.
Page 20Page 20
New Trend
5-10%
Analytical Professionals— Can create algorithms
Analytical Semi-Professionals— Can use visual tools, create
simple models
Analytical Amateurs— Can use spreadsheets
15-20%
70-80%Competing on Analytics, Tom Davenport
¹ At the end of 2007 classic, Competing on Analytics, Tom Davenport
predicted the rise of “analytical amateurs,”
Page 21Page 21
New Trend
Algorithm Marketplaces Are Bringing the App Economy
to Analytics
Source: Gartner (October 2015)
Page 22Page 22
New Trend
Easier-to-use analytics tools : Smart data discovery
“Smart data discovery is a next-generation data discovery capability that provides insights
from advanced analytics to business users or citizen data scientists without requiring them to
have traditional data scientist expertise.”
Source: Gartner (June 2015)
Page 23Page 23
New Trend
Current Data Discovery Analytics
Workflow
Emerging Smart Data Discovery Analytics
Workflow
Source: Gartner (June 2015)
Easier-to-use analytics tools : Smart data discovery
Page 24Page 24
Business
User
New Trend
Algorithms
DAaaS functional elements
Smart Data Discovery
“ ~ make new sources of information accessible, consumable and meaningful to organizations of all sizes, even ones that don't have extensive advanced analytics skills or in-house resources.”
Citizen
Data
Scientist
이자료는매월계속업데이트될예정입니다.