Application architectures with hadoop – big data techcon 2014

69
1 Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 ApplicaAon Architectures with Hadoop Mark Grover | SoGware Engineer Jonathan Seidman | SoluAons Architect, Partner Engineering April 1, 2014 ©2014 Cloudera, Inc. All Rights Reserved.

description

Deck from presentation at Big Data TechCon Boston 2014 on building applications with Hadoop and tools from the Hadoop ecosystem.

Transcript of Application architectures with hadoop – big data techcon 2014

Page 1: Application architectures with hadoop – big data techcon 2014

1

Headline  Goes  Here  Speaker  Name  or  Subhead  Goes  Here  

DO  NOT  USE  PUBLICLY  PRIOR  TO  10/23/12  

ApplicaAon  Architectures  with  Hadoop  Mark  Grover  |  SoGware  Engineer  Jonathan  Seidman    |  SoluAons  Architect,  Partner  Engineering  April  1,  2014  

©2014 Cloudera, Inc. All Rights Reserved.

Page 2: Application architectures with hadoop – big data techcon 2014

About  Us  • Mark  

•  CommiOer  on  Apache  Bigtop,  commiOer  and  PPMC  member  on  Apache  Sentry  (incubaAng).  

•  Contributor  to  Hadoop,  Hive,  Spark,  Sqoop,  Flume.  •  @mark_grover  

•  Jonathan  •  SoluAons  Architect,  Partner  Engineering  Team.  •  Co-­‐founder  of  Chicago  Hadoop  User  Group  and  Chicago  Big  Data.  •  [email protected]  •  @jseidman  

2 ©2014 Cloudera, Inc. All Rights Reserved.

Page 3: Application architectures with hadoop – big data techcon 2014

Co-­‐authoring  O’Reilly  book  

•  Titled  ‘Hadoop  ApplicaAon  Architectures’  • How  to  build  end-­‐to-­‐end  soluAons  using    Apache  Hadoop  and  related  tools  • Updates  on  TwiOer:  @hadooparchbook  •  hOp://www.hadooparchitecturebook.com  

©2014 Cloudera, Inc. All Rights Reserved. 3

Page 4: Application architectures with hadoop – big data techcon 2014

Challenges  of  Hadoop  ImplementaAon  

4   ©2014 Cloudera, Inc. All Rights Reserved.

Page 5: Application architectures with hadoop – big data techcon 2014

Challenges  of  Hadoop  ImplementaAon  

5   ©2014 Cloudera, Inc. All Rights Reserved.

Page 6: Application architectures with hadoop – big data techcon 2014

6

Click  Stream  Analysis  

Case  Study  

©2014 Cloudera, Inc. All Rights Reserved.

Page 7: Application architectures with hadoop – big data techcon 2014

Click  Stream  Analysis  

7  

Log  Files  

DWH  X  

©2014 Cloudera, Inc. All Rights Reserved.

Page 8: Application architectures with hadoop – big data techcon 2014

Web  Log  Example  

©2014 Cloudera, Inc. All Rights Reserved. 8  

[2012/09/22 20:56:04.294 -0500] "GET /info/ HTTP/1.1" 200 701 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en)" "age=38&gender=1&incomeCategory=5&session=983040389&user=627735038&region=8&userType=1” [2012/09/23 14:12:52.294 -0500] "GET /wish/remove/275 HTTP/1.1" 200 701 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us) AppleWebKit/533.16 (KHTML, like Gecko) Version/5.0 Safari/533.16" "age=63&gender=1&incomeCategory=1&session=1561203915&user=1364334488&region=4&userType=1"

Page 9: Application architectures with hadoop – big data techcon 2014

Hadoop  Architectural  ConsideraAons    

•  Storage  managers?  •  HDFS?  HBase?  

•  Data  storage  and  modeling:  •  File  formats?  Compression?  Schema  design?  

•  Data  movement  •  How  do  we  actually  get  the  data  into  Hadoop?  How  do  we  get  it  out?  

•  Metadata  •  How  do  we  manage  data  about  the  data?  

•  Data  access  and  processing  •  How  will  the  data  be  accessed  once  in  Hadoop?  How  can  we  transform  it?  How  do  we  query  it?  

•  OrchestraAon  •  How  do  we  manage  the  workflow  for  all  of  this?  

9 ©2014 Cloudera, Inc. All Rights Reserved.

Page 10: Application architectures with hadoop – big data techcon 2014

10

Data  Storage  and  Modeling  

©2014 Cloudera, Inc. All Rights Reserved.

Page 11: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  Storage  Manager  consideraAons  

•  Popular  storage  managers  for  Hadoop  •  Hadoop  Distributed  File  System  (HDFS)  •  HBase  

11 ©2014 Cloudera, Inc. All Rights Reserved.

Page 12: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  HDFS  vs  HBase  

HDFS  

•  Stores  data  directly  as  files  •  Fast  scans  •  Poor  random  reads/writes  

HBase  

•  Stores  data  as  Hfiles  on  HDFS  •  Slow  scans  •  Fast  random  reads/writes  

12   ©2014 Cloudera, Inc. All Rights Reserved.

Page 13: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  Storage  Manager  consideraAons  

• We  choose  HDFS  •  AnalyAcal  needs  in  this  case  served  beOer  by  fast  scans.  

13 ©2014 Cloudera, Inc. All Rights Reserved.

Page 14: Application architectures with hadoop – big data techcon 2014

14

Data  Storage  Format  

©2014 Cloudera, Inc. All Rights Reserved.

Page 15: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  Format  ConsideraAons    

•  Store  as  plain  text?  •  Sure,  well  supported  by  Hadoop.  •  Text  can  easily  be  processed  by  MapReduce,  loaded  into  Hive  for  analysis,  and  so  on.  

•  But…  • Will  begin  to  consume  lots  of  space  in  HDFS.  • May  not  be  opAmal  for  processing  by  tools  in  the  Hadoop  ecosystem.  

15 ©2014 Cloudera, Inc. All Rights Reserved.

Page 16: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  Format  ConsideraAons    

•  But,  we  can  compress  the  text  files…  •  Gzip  –  supported  by  Hadoop,  but  not  spliOable.  •  Bzip2  –  hey,  spliOable!  Great  compression!  But  decompression  is  slooowww.  

•  LZO  –  spliOable  (with  some  work),  good  compress/de-­‐compress  performance.  Good  choice  for  storing  text  files  on  Hadoop.    

•  Snappy  –  provides  a  good  tradeoff  between  size  and  speed.    

16 ©2014 Cloudera, Inc. All Rights Reserved.

Page 17: Application architectures with hadoop – big data techcon 2014

Data  Storage  –  More  About  Snappy  

•  Designed  at  Google  to  provide  high  compression  speeds  with  reasonable  compression.  

•  Not  the  highest  compression,  but  provides  very  good  performance  for  processing  on  Hadoop.  

•  Snappy  is  not  spliOable  though,  which  brings  us  to…    

17 ©2014 Cloudera, Inc. All Rights Reserved.

Page 18: Application architectures with hadoop – big data techcon 2014

SequenceFile  

• Stores  records  as  binary  key/value  pairs.  

• SequenceFile  “blocks”  can  be  compressed.  

• This  enables  spliOability  with  non-­‐spliOable  compression.      

18   ©2014 Cloudera, Inc. All Rights Reserved.

Page 19: Application architectures with hadoop – big data techcon 2014

Avro  

• Kinda  SequenceFile  on  Steroids.  

• Self-­‐documenAng  –  stores  schema  in  header.  

• Provides  very  efficient  storage.  

• Supports  spliOable  compression.  

19   ©2014 Cloudera, Inc. All Rights Reserved.

Page 20: Application architectures with hadoop – big data techcon 2014

Our  Format  Choices…  

• Avro  with  Snappy  •  Snappy  provides  opAmized  compression.  •  Avro  provides  compact  storage,  self-­‐documenAng  files,  and  supports  schema  evoluAon.  

•  Avro  also  provides  beOer  failure  handling  than  other  choices.  •  SequenceFiles  would  also  be  a  good  choice,  and  are  directly  supported  by  ingesAon  tools  in  the  ecosystem.  

•  But  only  supports  Java.  

20 ©2014 Cloudera, Inc. All Rights Reserved.

Page 21: Application architectures with hadoop – big data techcon 2014

21

HDFS  Schema  Design  

©2014 Cloudera, Inc. All Rights Reserved.

Page 22: Application architectures with hadoop – big data techcon 2014

Recommended  HDFS  Schema  Design  

• How  to  lay  out  data  on  HDFS?  

22 ©2014 Cloudera, Inc. All Rights Reserved.

Page 23: Application architectures with hadoop – big data techcon 2014

Recommended  HDFS  Schema  Design  

/user/<username>  -­‐  User  specific  data,  jars,  conf  files  /etl  –  Data  in  various  stages  of  ETL  workflow  /tmp  –  temp  data  from  tools  or  shared  between  users  /data  –  shared  data  for  the  enAre  organizaAon  /app  –  Everything  but  data:  UDF  jars,  HQL  files,  Oozie  workflows  

23 ©2014 Cloudera, Inc. All Rights Reserved.

Page 24: Application architectures with hadoop – big data techcon 2014

24

Advanced  HDFS  Schema  Design  

©2014 Cloudera, Inc. All Rights Reserved.

Page 25: Application architectures with hadoop – big data techcon 2014

What  is  ParAAoning?  

25

dataset        col=val1/file.txt        col=val2/file.txt          .          .          .        col=valn/file.txt  

dataset      file1.txt      file2.txt          .          .          .        filen.txt  

Un-­‐parAAoned  HDFS  directory  structure  

ParAAoned  HDFS  directory  structure  

©2014 Cloudera, Inc. All Rights Reserved.

Page 26: Application architectures with hadoop – big data techcon 2014

What  is  ParAAoning?  

26

clicks        dt=2014-­‐01-­‐01/clicks.txt        dt=2014-­‐01-­‐02/clicks.txt          .          .          .        dt=2014-­‐03-­‐31/clicks.txt  

clicks      clicks-­‐2014-­‐01-­‐01.txt      clicks-­‐2014-­‐01-­‐02.txt          .          .          .        clicks-­‐2014-­‐03-­‐31.txt  

Un-­‐parAAoned  HDFS  directory  structure  

ParAAoned  HDFS  directory  structure  

©2014 Cloudera, Inc. All Rights Reserved.

Page 27: Application architectures with hadoop – big data techcon 2014

ParAAoning  

•  Split  the  dataset  into  smaller  consumable  chunks  •  Rudimentary  form  of  “indexing”  •  <data  set  name>/<parAAon_column_name=parAAon_column_value>/{files}  

27 ©2014 Cloudera, Inc. All Rights Reserved.

Page 28: Application architectures with hadoop – big data techcon 2014

ParAAoning  consideraAons  

• What  column  to  bucket  by?  •  HDFS  is  append  only.  •  Don’t  have  too  many  parAAons  (<10,000)  •  Don’t  have  too  many  small  files  in  the  parAAons  (more  than  block  size  generally)  

• We  decided  to  parAAon  by  1mestamp  

28 ©2014 Cloudera, Inc. All Rights Reserved.

Page 29: Application architectures with hadoop – big data techcon 2014

What  is  buckeAng?  

29

clicks        dt=2014-­‐01-­‐01/clicks.txt          dt=2014-­‐01-­‐02/clicks.txt  

Un-­‐bucketed  HDFS  directory  structure  

clicks        dt=2014-­‐01-­‐01/file0.txt        dt=2014-­‐01-­‐01/file1.txt        dt=2014-­‐01-­‐01/file2.txt        dt=2014-­‐01-­‐01/file3.txt          dt=2014-­‐01-­‐02/file0.txt        dt=2014-­‐01-­‐02/file1.txt        dt=2014-­‐01-­‐02/file2.txt        dt=2014-­‐01-­‐02/file3.txt  

Bucketed  HDFS  directory  structure  

©2014 Cloudera, Inc. All Rights Reserved.

Page 30: Application architectures with hadoop – big data techcon 2014

BuckeAng  

•  Hash-­‐bucketed  files  within  each  parAAon  based  on  a  parAcular  column  

•  Useful  when  sampling  •  In  some  joins,  pre-­‐reqs:  

•  Datasets  bucketed  on  the  same  key  as  the  join  key  •  Number  of  buckets  are  the  same  or  one  is  a  mulAple  of  the  other  

30 ©2014 Cloudera, Inc. All Rights Reserved.

Page 31: Application architectures with hadoop – big data techcon 2014

BuckeAng  consideraAons?  

• Which  column  to  bucket  on?  •  How  many  buckets?  • We  decided  to  bucket  based  on  cookie  

31 ©2014 Cloudera, Inc. All Rights Reserved.

Page 32: Application architectures with hadoop – big data techcon 2014

De-­‐normalizing  consideraAons  

•  In  general,  big  data  joins  are  expensive  • When  to  de-­‐normalize?  

•  Decided  to  join  the  smaller  dimension  tables  •  Big  fact  tables  are  sAll  joined  

32 ©2014 Cloudera, Inc. All Rights Reserved.

Page 33: Application architectures with hadoop – big data techcon 2014

33

Data  IngesAon  

©2014 Cloudera, Inc. All Rights Reserved.

Page 34: Application architectures with hadoop – big data techcon 2014

File  Transfers    

• “hadoop  fs  –put  <file>”  • Reliable,  but  not  resilient  to  failure.  

• Other  opAons  are  mountable  HDFS,  for  example  NFSv3.  

34   ©2014 Cloudera, Inc. All Rights Reserved.

Page 35: Application architectures with hadoop – big data techcon 2014

Streaming  IngesAon  

•  Flume  •  Reliable,  distributed,  and  available  system  for  efficient  collecAon,  aggregaAon  and  movement  of  streaming  data,  e.g.  logs.  

•  Ka{a  •  Reliable  and  distributed  publish-­‐subscribe  messaging  system.  

35 ©2014 Cloudera, Inc. All Rights Reserved.

Page 36: Application architectures with hadoop – big data techcon 2014

Flume  vs.  Ka{a  

• Purpose  built  for  Hadoop  data  ingest.  

• Pre-­‐built  sinks  for  HDFS,  HBase,  etc.  

• Supports  transformaAon  of  data  in-­‐flight.  

• General  pub-­‐sub  messaging  framework.  

• Hadoop  not  supported,  requires  3rd-­‐party  component  (Camus).  

• Just  a  message  transport  (a  very  fast  one).  

36   ©2014 Cloudera, Inc. All Rights Reserved.

Page 37: Application architectures with hadoop – big data techcon 2014

Flume  vs.  Ka{a  

•  BoOom  line:  •  Flume  very  well  integrated  with  Hadoop  ecosystem,  well  suited  to  ingesAon  of  sources  such  as  log  files.  

•  Ka{a  is  a  highly  reliable  and  scalable  enterprise  messaging  system,  and  great  for  scaling  out  to  mulAple  consumers.  

37 ©2014 Cloudera, Inc. All Rights Reserved.

Page 38: Application architectures with hadoop – big data techcon 2014

A  Quick  IntroducAon  to  Flume  

38  

Flume  Agent  

Source   Channel   Sink   DesAnaAon  External  Source  

Web  Server  TwiOer  JMS  System  logs  …  

Consumes  events  and  forwards  to  channels  

Stores  events  unAl  consumed  by  sinks  –  file,  memory,  JDBC  

Removes  event  from  channel  and  puts  into  external  desAnaAon  

JVM    process  hosAng  components  

©2014 Cloudera, Inc. All Rights Reserved.

Page 39: Application architectures with hadoop – big data techcon 2014

A  Quick  IntroducAon  to  Flume  

•  Reliable  –  events  are  stored  in  channel  unAl  delivered  to  next  stage.  •  Recoverable  –  events  can  be  persisted  to  disk  and  recovered  in  the  event  of  failure.  

39

Flume  Agent  

Source   Channel   Sink   DesAnaAon  

©2014 Cloudera, Inc. All Rights Reserved.

Page 40: Application architectures with hadoop – big data techcon 2014

A  Quick  IntroducAon  to  Flume  

• DeclaraAve    • No  coding  required.  •  ConfiguraAon  specifies  how  components  are  wired  together.  

40   ©2014 Cloudera, Inc. All Rights Reserved.

Page 41: Application architectures with hadoop – big data techcon 2014

A  Brief  Discussion  of  Flume  PaOerns  –  Fan-­‐in  

• Flume  agent  runs  on  each  of  our  servers.  

• These  agents  send  data  to  mulAple  agents  to  provide  reliability.  

• Flume  provides  support  for  load  balancing.  

41   ©2014 Cloudera, Inc. All Rights Reserved.

Page 42: Application architectures with hadoop – big data techcon 2014

A  Brief  Discussion  of  Flume  PaOerns  –  Spli~ng  

• Common  need  is  to  split  data  on  ingest.  

• For  example:  •  Sending  data  to  mulAple  clusters  for  DR.  

•  To  mulAple  desAnaAons.  • Flume  also  supports  parAAoning,  which  is  key  to  our  implementaAon.  

42   ©2014 Cloudera, Inc. All Rights Reserved.

Page 43: Application architectures with hadoop – big data techcon 2014

Sqoop  Overview  

• Apache  project  designed  to  ease  import  and  export  of  data  between  Hadoop  and  external  data  stores  such  as  relaAonal  databases.  

• Great  for  doing  bulk  imports  and  exports  of  data  between  HDFS,  Hive  and  HBase  and  an  external  data  store.  Not  suited  for  ingesAng  event  based  data.  

©2014 Cloudera, Inc. All Rights Reserved. 43

Page 44: Application architectures with hadoop – big data techcon 2014

IngesAon  Decisions  

• Historical  Data  •  Smaller  files:  file  transfer  •  Larger  files:  Flume  with  spooling  directory  source.  

•  Incoming  Data  •  Flume  with  the  spooling  directory  source.  

44 ©2014 Cloudera, Inc. All Rights Reserved.

Page 45: Application architectures with hadoop – big data techcon 2014

45

Data  Processing  and  Access  

©2014 Cloudera, Inc. All Rights Reserved.

Page 46: Application architectures with hadoop – big data techcon 2014

Data  flow  

46  

Raw  data  

ParAAoned  clickstream  

data  

Other  data  (Financial,  CRM,  etc.)  

Aggregated  dataset  #2  

Aggregated  dataset  #1  

©2014 Cloudera, Inc. All Rights Reserved.

Page 47: Application architectures with hadoop – big data techcon 2014

Data  processing  tools  

47  

• Hive  •  Impala  •  Pig,  etc.  

©2014 Cloudera, Inc. All Rights Reserved.

Page 48: Application architectures with hadoop – big data techcon 2014

Hive  

48  

• Open  source  data  warehouse  system  for  Hadoop  •  Converts  SQL-­‐like  queries  to  MapReduce  jobs  

• Work  is  being  done  to  move  this  away  from  MR  •  Stores  metadata  in  Hive  metastore  •  Can  create  tables  over  HDFS  or  HBase  data  • Access  available  via  JDBC/ODBC  

©2014 Cloudera, Inc. All Rights Reserved.

Page 49: Application architectures with hadoop – big data techcon 2014

Impala  

49  

•  Real-­‐Ame  open  source  SQL  query  engine  for  Hadoop  • Doesn’t  build  on  MapReduce  • WriOen  in  C++,  uses  LLVM  for  run-­‐Ame  code  generaAon  •  Can  create  tables  over  HDFS  or  HBase  data  • Accesses  Hive  metastore  for  metadata  • Access  available  via  JDBC/ODBC  

©2014 Cloudera, Inc. All Rights Reserved.

Page 50: Application architectures with hadoop – big data techcon 2014

Pig  

50  

• Higher  level  abstracAon  over  MapReduce  (like  Hive)  • Write  transformaAons  in  scripAng  language  –  Pig  LaAn  •  Can  access  Hive  metastore  via  HCatalog  for  metadata  

©2014 Cloudera, Inc. All Rights Reserved.

Page 51: Application architectures with hadoop – big data techcon 2014

Data  Processing  consideraAons  

51  

• We  chose  Hive  for  ETL    and  Impala  for  interac1ve  BI.  

©2014 Cloudera, Inc. All Rights Reserved.

Page 52: Application architectures with hadoop – big data techcon 2014

52

Metadata  Management  

©2014 Cloudera, Inc. All Rights Reserved.

Page 53: Application architectures with hadoop – big data techcon 2014

What  is  Metadata?  

53  

• Metadata  is  data  about  the  data  •  Format  in  which  data  is  stored  •  Compression  codec  •  LocaAon  of  the  data  •  Is  the  data  parAAoned/bucketed/sorted?  

©2014 Cloudera, Inc. All Rights Reserved.

Page 54: Application architectures with hadoop – big data techcon 2014

Metadata  in  Hive  

54

Hive  Metastore  

©2014 Cloudera, Inc. All Rights Reserved.

Page 55: Application architectures with hadoop – big data techcon 2014

Metadata  

55  

• Hive  metastore  has  become  the  de-­‐facto  metadata  repository  • HCatalog  makes  Hive  metastore  accessible  to  other  applicaAons  (Pig,  MapReduce,  custom  apps,  etc.)  

©2014 Cloudera, Inc. All Rights Reserved.

Page 56: Application architectures with hadoop – big data techcon 2014

Hive  +  HCatalog  

56   ©2014 Cloudera, Inc. All Rights Reserved.

Page 57: Application architectures with hadoop – big data techcon 2014

57

OrchestraAon  

©2014 Cloudera, Inc. All Rights Reserved.

Page 58: Application architectures with hadoop – big data techcon 2014

OrchestraAon  

• Once  the  data  is  in  Hadoop,  we  need  a  way  to  manage  workflows  in  our  architecture.  

•  Scheduling  and  tracking  MapReduce  jobs,  Hive  jobs,  etc.  •  Several  opAons  here:  

•  Cron  •  Oozie,  Azkaban  •  3rd-­‐party  tools,  Talend,  Pentaho,  InformaAca,  enterprise  schedulers.  

58 ©2014 Cloudera, Inc. All Rights Reserved.

Page 59: Application architectures with hadoop – big data techcon 2014

Oozie  

• Supports  defining  and  execuAng  a  sequence  of  jobs.  

• Can  trigger  jobs  based  on  external  dependencies  or  schedules.  

59   ©2014 Cloudera, Inc. All Rights Reserved.

Page 60: Application architectures with hadoop – big data techcon 2014

60

Final  Architecture  

©2014 Cloudera, Inc. All Rights Reserved.

Page 61: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  High  Level  Overview  

61  

Data  Sources   IngesAon  

Data  Storage/Processing  

Data  ReporAng/Analysis  

©2014 Cloudera, Inc. All Rights Reserved.

Page 62: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  High  Level  Overview  

62  

Data  Sources   IngesAon  

Data  Storage/Processing  

Data  ReporAng/Analysis  

©2014 Cloudera, Inc. All Rights Reserved.

Page 63: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  IngesAon  

63  

Web  App   Avro  Agent  Web  App   Avro  Agent  

Web  App   Avro  Agent  Web  App   Avro  Agent  

Web  App   Avro  Agent  Web  App   Avro  Agent  

Web  App   Avro  Agent  Web  App   Avro  Agent  

Flume  Agent  

Flume  Agent  

Flume  Agent  

Flume  Agent  

Fan-­‐in    PaOern  

MulA  Agents  for    Failover  and  rolling  restarts  

HDFS    

©2014 Cloudera, Inc. All Rights Reserved.

Page 64: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  High  Level  Overview  

64  

Data  Sources   IngesAon  

Data  Storage/Processing  

Data  ReporAng/Analysis  

©2014 Cloudera, Inc. All Rights Reserved.

Page 65: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  Storage  and  Processing  

65  

/etl/weblogs/20140331/  /etl/weblogs/20140401/  …  

Data  Processing  /data/markeAng/clickstream/bouncerate/  /data/markeAng/clickstream/aOribuAon/  …  

©2014 Cloudera, Inc. All Rights Reserved.

Page 66: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  High  Level  Overview  

66  

Data  Sources   IngesAon  

Data  Storage/Processing  

Data  ReporAng/Analysis  

©2014 Cloudera, Inc. All Rights Reserved.

Page 67: Application architectures with hadoop – big data techcon 2014

Final  Architecture  –  Data  Access  

67  

Hive/Impala  

BI/AnalyAcs  Tools  

DWH  Sqoop  

Local  Disk  

R,  etc.  

DB  import  tool  

JDBC/ODBC  

©2014 Cloudera, Inc. All Rights Reserved.

Page 68: Application architectures with hadoop – big data techcon 2014

Contact  info  • Mark  Grover  

• @mark_grover  •  www.linkedin.com/in/grovermark  

•  Jonathan  Seidman  •  [email protected]  • @jseidman  •  hOps://www.linkedin.com/pub/jonathan-­‐seidman/1/26a/959  •  hOp://www.slideshare.net/jseidman  

•  Slides  at  slideshare.net/hadooparchbook  

68 ©2014 Cloudera, Inc. All Rights Reserved.

Page 69: Application architectures with hadoop – big data techcon 2014

69 ©2014 Cloudera, Inc. All Rights Reserved.