Cloudera 助力台灣大數據產業的發展

20
1 © Cloudera, Inc. All rights reserved. Cloudera 助力台湾 大数据产业的发展 Kai X. Miao (苗凯翔) Vice President, Cloudera Corpora@on

Transcript of Cloudera 助力台灣大數據產業的發展

Page 1: Cloudera 助力台灣大數據產業的發展

1  ©  Cloudera,  Inc.  All  rights  reserved.  

Cloudera  助力台湾  大数据产业的发展  

Kai  X.  Miao  (苗凯翔)  Vice  President,  Cloudera  Corpora@on  

Page 2: Cloudera 助力台灣大數據產業的發展

2  ©  Cloudera,  Inc.  All  rights  reserved.  

Big  Data  Is  Only  GeGng  Bigger  Par@cularly  Relevant  in  the  Telecom  Space  

Data  Growth  

STRUCTURED  DATA  –  10%  COMPLEX  DATA  –  90%  

1980   TODAY  

USER  PROFILES      

USAGE  DATA    

MOBILE  &  DEVICES  

NETWORK  MARKETING  &  CRM  PUBLIC  &  TRADE  

3rd Platform

Clients

Rich User Experiences

IOT Clients

By 2020,world data will reach 40ZB

In 2012,we have 2.8ZB1

Page 3: Cloudera 助力台灣大數據產業的發展

3  ©  Cloudera,  Inc.  All  rights  reserved.  

TradiGonal  Data  Architecture  Can’t  Handle  Big  Data    

Instrumenta@on  

Storage  Grid  (Original  Raw  Data)  

Collec@on  

ETL  Compute  Grid  

BI  Reports  +  Interac@ve  Apps  

RDBMS/EDW   Can’t  explore  original  raw  data  

Can’t  scale    

Sending  data  to  graveyard  

Page 4: Cloudera 助力台灣大數據產業的發展

4  ©  Cloudera,  Inc.  All  rights  reserved.  

A  Major  LimitaGon  of  RDBMS/EDW    

•  Schema  must  be  created  before  any  data  can  be  loaded  

•  An  explicit  load  opera@on  has  to  take  place  which  transforms  data  to  DB  internal  structure    

•  New  columns  must  be  added  explicitly  before  new  data  for  such  columns  can  be  added  into  the  data  base  

 

Schema-­‐on-­‐Write    

Page 5: Cloudera 助力台灣大數據產業的發展

5  ©  Cloudera,  Inc.  All  rights  reserved.  

Expanding  Data  Requires  A  New  Approach  

©2014  Cloudera,  Inc.  All  rights  reserved.  

5  

1980s  Bring  Data  to  Compute  

Now  Bring  Compute  to  Data  

RelaGve  size  &  complexity  

Data  InformaGon-­‐centric  

businesses  use  all  data:      

Mul@-­‐structured,    internal  &  external  data    

of  all  types  

Compute  

Compute  

Compute  

Process-­‐centric    businesses  use:  

 

• Structured  data  mainly  •  Internal  data  only  

• “Important”  data  only      

Compute  

Compute  

Compute  

Data  

Data  

Data  

Data  

Page 6: Cloudera 助力台灣大數據產業的發展

6  ©  Cloudera,  Inc.  All  rights  reserved.  

Hadoop改变处理数据方式

Hadoop方式  传统方式

$30,000+  per  TB  

•  Hard  to  scale  •  Network  is  a  bogleneck  •  Only  handles  rela@onal  data  •  Difficult  to  add  new  fields  &  data  types  

昂贵的、专有的、“可靠的”服务器 昂贵的软件许可  

Network  

数据存储  (SAN,  NAS)  

计算  (RDBMS,  EDW)  

$300  -­‐  $1,000  per  TB  

•  Scales  out  forever  •  No  boglenecks  •  Easy  to  ingest  any  data  •  Agile  data  access  

廉价的PC服务器 便宜的、开源的软件  

Compute  (CPU)  

Memory   Storage  (Disk)  

z  z  

Page 7: Cloudera 助力台灣大數據產業的發展

7  ©  Cloudera,  Inc.  All  rights  reserved.  7  

A  Strong  Track  Record  of  Innova@on  

2008  CLOUDERA  FOUNDED  BY  MIKE  OLSON  AMR  AWADALLAH  &  JEFF  HAMMERBACHER  

2009  HADOOP  CREATOR  

DOUG  CUTTING  JOINS  CLOUDERA  

2009  CLOUDERA  RELEASES  CDH  THE  FIRST  COMMERCIAL    APACHE  HADOOP  DISTRIBUTION  

2010  CLOUDERA  MANAGER:  FIRST  MANAGEMENT  

APPLICATION  FOR  HADOOP  

2011  CLOUDERA  REACHES  100  PRODUCTION  CUSTOMERS  

2011  CLOUDERA  

UNIVERSITY  EXPANDS  TO  140  COUNTRIES  

2012  CLOUDERA  ENTERPRISE  4  THE  STANDARD  FOR  HADOOP  IN  THE  ENTERPRISE  

2012  CLOUDERA  

CONNECT  REACHES  300  PARTNERS  

2014  THE  ENTERPRISE  DATA  HUB  LAUNCHED  

2013  CLOUDERA  IMPALA  CLOUDERA  NAVIGATOR  CLOUDERA  SEARCH    

2013  TOM  REILLY  JOINS  AS  CEO  

OVER  800  PARTNERS    IN  CLOUDERA  CONNECT  

2014  SERIES  F  FUNDING  WITH  INTEL  AS  KEY  PARTNER  

OVER  900  PARTNERS    IN  CLOUDERA  CONNECT  

2014  CLOUDERA  ENTERPRISE  5  

CDH Cloudera Manager

CLOUDERA  ENTERPRISE  

4  ASK  BIGGER  QUESTIONS  

ENTERPRISE  DATA  HUB  

CLOUDERA  ENTERPRISE  

5  

Page 8: Cloudera 助力台灣大數據產業的發展

8  ©  Cloudera,  Inc.  All  rights  reserved.  

Cloudera公司简介

©2014  Cloudera,  Inc.  All  rights  reserved.  

创始 2008年, 由前 员工共同创始

員工人數 900人以上

世界级技術支持 24x7的全球工作人员

积极主动与预测技術支持方案

关键任务 数以千计的企业用户

几百多个付费客户

最广泛的生态系统 1400多个商业合作伙伴

Cloudera University 培训100,000人以上

开源领袖 Cloudera的员工是业界领先的开发者和提供商

我们与英特尔的合作将能成功地开拓市场

Page 9: Cloudera 助力台灣大數據產業的發展

9  ©  Cloudera,  Inc.  All  rights  reserved.  9  

Open  Source  Scalable  Flexible  Cost-­‐EffecGve  

✔  

Managed   ✖  Open  Architecture   ✖  Secure  and  Governed   ✖  

✔  

✔  

✔  

3RD  PARTY  APPS  

STORAGE  FOR  ANY  TYPE  OF  DATA  UNIFIED,  ELASTIC,  RESILIENT,  SECURE  

         

CLOUDERA’S  ENTERPRISE  DATA  HUB  

BATCH  PROCESSING  

MAPREDUCE  

ANALYTIC  SQL  IMPALA  

SEARCH  ENGINE  

SOLR  

MACHINE  LEARNING  

SPARK  

STREAM  PROCESSING  SPARK  STREAMING  

WORKLOAD  MANAGEMENT  YARN  

FILESYSTEM  HDFS  

ONLINE  NOSQL  HBASE  

DATA  MAN

AGEMEN

T  CLO

UDERA  N

AVIGATO

R  

SYSTEM  

MAN

AGEMEN

T  CLO

UDERA  M

ANAG

ER  

SENTRY  

DBMS   Sensors   LOGS  

Sqoop  

Flume  

Page 10: Cloudera 助力台灣大數據產業的發展

10  ©  Cloudera,  Inc.  All  rights  reserved.  

WEB/MOBILE  APPLICATION  

ENTERPRISE  DATA  WAREHOUSE    

ENTERPRISE  REPORTING  BI  /  ANALYTICS  DATA  

MODELING  DEVELOPER  

SDKs  CLOUDERA  MANAGER  

CLOUDERA  NAVIGATOR  

ENTERPRISE  DATA  HUB  

Security  Admins   System  Admins   Engineers   Data  Scien@sts   Analysts   Business  Users  

Customers  &  End  Users  

SYS  LOGS   WEB  LOGS   FILES   RDBMS  

The  Modern  InformaGon  Architecture  

Page 11: Cloudera 助力台灣大數據產業的發展

11  ©  Cloudera,  Inc.  All  rights  reserved.  

Customer  Success  Across  Industries  Financial    &  Business  Services  Telecom  Technology  Healthcare  Life  Sciences  

Media  

Retail  Consumer  Energy  Public  Sector  

Page 12: Cloudera 助力台灣大數據產業的發展

12  ©  Cloudera,  Inc.  All  rights  reserved.      

客户360度分析  • Enhanced  customer  experience  &  support  • Personaliza@on,  targeted  offerings,  loyalty  programs  • Sen@ment  analysis  

渠道优化  • Campaign  management  • Selec@on  process  op@miza@on  

供应链优化  • Manufacturing  process  efficiency  • Supplier/merchant  management  

⻛风险管理  • Fraud  detec@on  • Intrusion  detec@on  &  digital  forensics  

审计  • Regulatory  compliance  (reten@on,  privacy)  • Usage  analysis  and  media@on  • e-­‐Discovery  

市场资讯  • Compe@@ve  analysis  • Economic  factor  analysis  • Customer  segmenta@on  

数据服务  • Data  as-­‐a-­‐product  • Data  enriched  with  insights/inferences  

Cloudera⼤大数据应⽤用案例种类  

12

Page 13: Cloudera 助力台灣大數據產業的發展

13  ©  Cloudera,  Inc.  All  rights  reserved.  

制造业的数据来自哪里?

设备&传感器

•  Device  Readings  •  Device  Performance  •  Device  Diagnos@cs  •  Bagery  /  Power  Consump@on  

•  Sotware  Logs  •  Environmental  Interac@ons  

•  R&D  •  Quality  /  Tes@ng  

工厂&作业

•  MES  •  Sensors  •  Video  /  Surveillance  •  Line  Produc@vity  •  Machines  •  Staffing  /  Scheduling  

供应链&库存

•  ERP  •  Supplier  /  Manufacturer  •  Orders  /  Receivables  •  Commodity  Supplies  /  Prices  

市场 & CRM

•  Transac@ons  •  Accounts    •  Warran@es  /  Atermarket  

•  Customer  Service  Logs  •  Campaigns  /  Promo@ons  

•  Website  /  SEO  •  Affiliates  /  Merchants  •  Surveys  •  Compe@@ve  Intelligence  

公共 & 交易

•  Market  Intelligence  •  Policy  /  Regula@on  •  Demographic  /  Census  •  Psychographic  •  Infla@on  /  Macroeconomic  •  Gas  Prices  •  Labor  Sta@s@cs  •  Social  /  Search  •  Public  Health  Data  •  Clinical  Studies  •  Store  Schema@cs  •  Journals  /  Editorial  •  Seismic  /  Specula@on  

Page 14: Cloudera 助力台灣大數據產業的發展

14  ©  Cloudera,  Inc.  All  rights  reserved.  

•  reduce  the  cost  of  sending  deepwater  drillships  out  into  the  ocean  (1M$/day)  

•  doing  a  beger  job  of  processing  the  vast  amounts  of  data  that  can  help  iden@fy  reservoirs  of  oil(0.5PB)  

•  Chevron  gathers  informa@on  in  five  dimensions  –  the  x  and  y  coordinates  of  both  the  wave’s  source  and  target,  along  with  the  @me  it  was  collected.    

•  Construct  picture  of  what  the  terrain  looks  like  under  the  ocean  floor  

•  The  company  uses  CDH  to  sort  that  data.  

Solu@on  

优化运营–雪佛龙  

•  The  more  data  Chevron  can  collect,  the  beger  it  can  find  pockets  of  oil  and  natural  gas  underground.    

•   Hadoop  can  do  some  of  the  seismic  data  processing  in  a  less  expensive  way  –  10x  less  than  tradi@onal  technologies  on  average.    

Challenge   Benefit  

Chevron  is  reducing  their  cost  of  sending  deepwater  drillships  into  the  ocean  by  more  precisely  iden@fying  oil  reservoirs.    

Page 15: Cloudera 助力台灣大數據產業的發展

15  ©  Cloudera,  Inc.  All  rights  reserved.  

Automo@ve  &  Industrial  

Problem  

Solu+on  

Background  

Proac+ve  Quality  Assurance  Build  machine  learning  algorithms  that  iden@fy  produc@on  anomalies  prior  to  field  tes@ng  and  find  performance  flaws  that  could  not  be  iden@fied  in  R&D.  

Silos  Limit  Op+ons  Legacy  systems  hold  historical  data  from  produc@on  line  telemetry,  factory  surveillance  and  sensors,  call  centers,  in-­‐car  telema@cs,  etc.  That  data  is  useless  if  it  is  kept  offline  and  in  silos.  

Anomaly  Detec+on  Spark  includes  MLLib,  a  library  of  machine  learning  algorithms  for  large  data,  enabling  clustering  to  iden@fy  outliers  from  typical  produc@on  pagerns.  

Use  Case  

卡特彼勒  卡特彼勒公司总部位于美国伊利诺州。是世界上最大的工程机械和矿山设备生产厂家、燃气发动机和工业用燃气轮机生产厂家之一,也是世界上最大的柴油机厂家之一。  

Page 16: Cloudera 助力台灣大數據產業的發展

16  ©  Cloudera,  Inc.  All  rights  reserved.  

Telco  Consumer  Profile  

16   ©2014  Cloudera,  Inc.  All  rights  reserved.  

Contact,  Credit  info,  date  of  renewal  

Device  type:  phone,  mobile  broadband,  tablet  

Data/Voice  Usage  and  Top-­‐up  

App  Preference,  interests,  usage  

Usage  trends:  @me  of  day,  data  amounts  

Loca@on  

Website  usage  

Social  Networks  Like/dislike,  profile  info  

Page 17: Cloudera 助力台灣大數據產業的發展

17  ©  Cloudera,  Inc.  All  rights  reserved.  ©2014  Cloudera,  Inc.  All  rights  reserved.  

Use  Case  

Problem  

Solu+on  

Partners  

Ac(onable  Sen(ment  Analysis  Isolate  customer  profiles  to  personalize  mix  of  plans,  services,  offers  based  on  convergence  of  informa@on  from  network,  GPS,  social,  call  centers,  accounts,  etc.  

Can’t  Scale  Beyond  Silos  Current  systems  can  not  integrate  social,  telemetric,  and  systems  data  in  real  @me  with  historical  data  to  tailor  product  mix  and  incen@ve  plans  to  the  user.  

Calculate  Anything  HBase  is  a  real-­‐@me  database  accommoda@ng  complex  historic  data.  Spark  and  Impala  converge  ETL,  analy@cs,  and  repor@ng  for  on-­‐demand  modeling.  

Customer  360o  View  

17  

Page 18: Cloudera 助力台灣大數據產業的發展

18  ©  Cloudera,  Inc.  All  rights  reserved.  

Where  Is  the  Financial  Services  Data?  Mapping  and  Consolida@on  Are  the  Tip  of  the  Iceberg  for  Big  Data  

Retail  Banking  

•  Bank  Transac@ons  •  Customer  Data  •  ATM  Ac@vity  •  Online  Ac@vity  •  Mobile  Ac@vity  •  Demographic  /  Census  Data  

•  Marke@ng  /  CRM  •  Social  /  Sen@ment  

Credit  Cards  &  Payments  

•  Card  Transac@ons  •  Customer  Data  •  Online  Ac@vity  •  Demographic  /  Census  Data  

•  Marke@ng  /  CRM  •  Integra@on  with  Retailers  /  Loyalty  

•  Social  /  Sen@ment    

Investment  Banking  

•  Trade  Data  •  Customer  Data  •  Web  Logs  •  Research  /  Publica@ons  •  Market  Data  •  Communica@ons  /  Documenta@on  

Insurance  

•  Claims  /  Policy  Data  •  Customer  Data  •  Demographic  /  Census  Data  

•  Weather  Data  •  Vehicle  Telemetry  •  Video  /  Surveillance  •  Sensors  •  Internet  of  Things  

Services  &  SROs  

•  Trade  Data  •  Communica@ons  /  Documenta@on  

•  Market  Data  •  Research  /  Publica@ons  •  Surveys  

Page 19: Cloudera 助力台灣大數據產業的發展

19  ©  Cloudera,  Inc.  All  rights  reserved.  

Data  silos  spread  across  company  with  80+  years’  history  •  Analysis  on  1  state  takes  24  

hours  •  Can’t  analyze  all  50  states  at  

once  

Universal  data  archive  on  Cloudera  •  Supports  storage,  ETL,  

applied  math  

Solu@on  

Customer  Spotlight:  Allstate  

Holis@c  analysis  on  all  50  states  in  16  hours  •  75X  faster  performance  

Challenge   Benefit  

Combining  80+  years  of  data  across  all  business  units  &  all  50  states.  

Page 20: Cloudera 助力台灣大數據產業的發展

20  ©  Cloudera,  Inc.  All  rights  reserved.  

Thank  you!