Post on 22-Jan-2018
1 Cloudera, Inc. All rights reserved.
Hadoop
2 Cloudera, Inc. All rights reserved.
20114ClouderaCloudera
email: sho@cloudera.com twitter: @shiumachi
6 Cloudera, Inc. All rights reserved.
(EDH)
1
Sqoop, Flume
MapReduce, Hive,
Pig, Spark
Impala
Solr
SAS, R, Spark,
Mahout
NoSQL
HBase
Spark Streaming
HDFS, HBase
YARN, Cloudera Manager,Cloudera Navigator
7 Cloudera, Inc. All rights reserved.
8 Cloudera, Inc. All rights reserved.
DISCLAIMERHadoopEDHDWH
ClouderaCloudera
(HA)
:
10 Cloudera, Inc. All rights reserved.
12 Cloudera, Inc. All rights reserved.
(1)
13 Cloudera, Inc. All rights reserved.
(1)
tar.gz
14 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
15 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
16 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
17 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
18 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
19 Cloudera, Inc. All rights reserved.
(1)
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
20 Cloudera, Inc. All rights reserved.
2009-2012Hadoop = MapReduceJavaHivePig
HiveBI
21 Cloudera, Inc. All rights reserved.
MapReduceHadoop
HDFS2012
Hive SQLMapReducePig
Avro
RCFile Parquet
FlumeSource - Channel - Sink 3Source Sink
22 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
23 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
24 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
25 Cloudera, Inc. All rights reserved.
(2)BI
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
26 Cloudera, Inc. All rights reserved.
(2)BI2012 ImpalaBIHadoopParquetHBaseBIHBase + Parquet
(HBase)
27 Cloudera, Inc. All rights reserved.
BI Impala201210HadoopSQLHadoopMapReduce
ParquetClouderaTwitter
HBaseNoSQLHBase2009Impala
28 Cloudera, Inc. All rights reserved.
Parquet
HBase
Parquet + HBase
(Parquet)
(HBase)
(HBase)
(HBase)
29 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
30 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
31 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
32 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
33 Cloudera, Inc. All rights reserved.
(3)SparkEDH
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Spark
34 Cloudera, Inc. All rights reserved.
(3)SparkEDH2013 Cloudera SearchHadoopSparkSQL
35 Cloudera, Inc. All rights reserved.
SparkEDHSolrOSSSolrHadoopClouderaSolrCloudera Search OSS
Lily HBase IndexerHBaseSolr
Spark MapReduceAPI
36 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Spark
37 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Spark
38 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Spark
39 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Flume Sink
HBase Sink
Spark
40 Cloudera, Inc. All rights reserved.
(4)Kafka
tar.gz
HDFS
tar.gz
HFDS
put
HDFS
Avro
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
HBase
HBase
HBase
get/put API
Impala
BI
Solr
Flume Sink
NRT
Lily HBase Indexer
NRT
Solr
Kafka Broker
Flume Source
Kafka Source
Kafka Producer
Producer API
Flume Sink
HBase Sink
Spark Streaming
Spark
41 Cloudera, Inc. All rights reserved.
(4)Kafka2015 KafkaSpark Streaming end-to-end
42 Cloudera, Inc. All rights reserved.
KafkaKafkaFlumeKafka1
Spark StreamingSpark
43 Cloudera, Inc. All rights reserved.
SLA
44 Cloudera, Inc. All rights reserved.
SLA1SLA()
45 Cloudera, Inc. All rights reserved.
SLA1: SLAImpala51
2: SLAHadoopHadoopHadoop
3: SLA
46 Cloudera, Inc. All rights reserved.
end-to-endSLA
Hadoop
SLA
ImpalaParquetFlume(Parquet)
Impala
HBase()
Impala()
Impala
HadoopHadoop
Hadoopend-to-end
47 Cloudera, Inc. All rights reserved.
ParquetImpala
HBase
SparkMapReduce
48 Cloudera, Inc. All rights reserved.
49 Cloudera, Inc. All rights reserved.
tar.gz
HDFS
tar.gz
HDFS
Avro
HBase
HBase
Solr
Kafka Broker
HFDS
put
MapReduce
Hive
RCFile
HDFS
RCFile
Hive
Flume Source
Flume Sink
HDFS Sink
HDFS
SequenceFile
Hive
Parquet
HDFS
Parquet
Impala
BI
Flume Sink
HBase Sink
HBase
get/put API
Lily HBase Indexer
NRT
Spark Streaming
Solr
Flume Source
Kafka Source
Kafka Producer
Producer API
Spark
Flume Sink
NRT
50 Cloudera, Inc. All rights reserved.
SLASLAend-to-endSLA
SLA
51 Cloudera, Inc. All rights reserved.
We are hiring!
career-jp@cloudera.com
52 Cloudera, Inc. All rights reserved.
Thank you