1. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / 1 1 HiveImpala HiveHadoop 3 Hadoop
2013724 R&D
2. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / 2
3. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / v@sudabon) v R&D v l19972004 NEC /
l20052006 BIGLOBE / BtoB l2012/8 vHadoop1 vHadoopHive 3
4. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / v MOBYLOG l Web l 200512 l PC l MOBYLOG
ENGINE OmnitureAdobeSiteCatalystWebTrends WebTrends AnalyticsOEM v
Xdata collect l l 20135 l Web CSV l 50 4
5. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / vMOBYLOG l v lEC l l Hadoop 5
6. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / v Hadoop v lAmazon EMR AliveTransient
lHadoop Streaming S3EC2PHP Hadoop1 6
7. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / vAWSR Hadoop Hadoop2 7 ClouderaHadoop
CDH Cloudera ManagerGUI HiveJavaSQL MapReduce
8. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / v lHive lHive vHive/MapReduce Cloudera
Impala Hive745 Hadoop3 8
9. Copyright CELLANT Corp. All Rights Reserved. h t t p : / / w
w w . c e l l a n t . j p / Impala 9 2012/10/24 0.1 beta 2012/11/14
0.2 beta 2012/12/05 0.3 beta 2013/01/19 0.4 beta 2013/02/05 0.5
beta 2013/02/27 0.6 beta 2013/04/15 0.7 beta 2013/05/01 1.0
2013/06/18 1.0.1 2012/11/02 5 slideshare 2012/12/07 14 @IT
slideshare 2013/03/06 14 slideshare 2013/05/01 14 slideshare
HiBench RCFile mem_limitParquet
10. Copyright CELLANT Corp. All Rights Reserved. h t t p : / /
w w w . c e l l a n t . j p / Our System Environment 10 v Install
using Cloudera Manager Free Edition 4.5.2 Master Slave 11 Servers
All servers are connected with 1Gbps Ethernet through an L2 switch
Active NameNode DataNode TaskTracker Impalad Stand-by NameNode
JobTracker statestored 3 Servers DataNode TaskTracker Impalad
DataNode TaskTracker Impalad DataNode TaskTracker Impalad DataNode
TaskTracker Impalad DataNode TaskTracker Impalad DataNode
TaskTracker Impalad DataNode TaskTracker Impalad DataNode
TaskTracker Impalad DataNode TaskTracker Impalad DataNode
TaskTracker Impalad
11. Copyright CELLANT Corp. All Rights Reserved. h t t p : / /
w w w . c e l l a n t . j p / vCPU lIntel Core 2 Duo 2.13 GHz with
Hyper Threading vMemory l4GB vDisk l7,200 rpm SATA mechanical Hard
Disk Drive * 1 vOS lCent OS 6.2 Our Server Specication 11
12. Copyright CELLANT Corp. All Rights Reserved. h t t p : / /
w w w . c e l l a n t . j p / v Use CDH4.2.1 + Impala 1.0 v Use
hivebench in open-sourced benchmark tool HiBench l
https://github.com/hibench v Modied datasets to 1/10 scale l
Default conguration generates table with 1 billion rows v Modied
query sentence l Deleted INSERT INTO TABLE to evaluate read-only
performance v Combines a few storage format with a few compression
method l TextFile, SequenceFile, RCFile, ParquestFile l No
compression, Gzip, Snappy v Comparison with job query latency v
Average job latency over 5 measurements Benchmark 12
13. Copyright CELLANT Corp. All Rights Reserved. h t t p : / /
w w w . c e l l a n t . j p / v Uservisits table l 100 million rows
l 16,895 MB as TextFile l Table Denitions sourceIP string destURL
string visitDate string adRevenue double userAgent string
countryCode string languageCode string searchWord string duration
int Modied Datasets 13 v Rankings table l 12 million rows l 744 MB
as TextFile l Table Denitions pageURL string pageRank int
avgDuration int
14. Copyright CELLANT Corp. All Rights Reserved. h t t p : / /
w w w . c e l l a n t . j p / SELECT sourceIP, sum(adRevenue) as
totalRevenue, avg(pageRank) FROM rankings_t R JOIN ( SELECT
sourceIP, destURL, adRevenue FROM uservisits_t UV WHERE
(datedi(UV.visitDate, '1999-01-01')>=0 AND datedi(UV.visitDate,
'2000-01-01')