OLAP options on Hadoop
-
Upload
yuta-imai -
Category
Technology
-
view
480 -
download
1
Transcript of OLAP options on Hadoop
![Page 1: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/1.jpg)
OLAPop(onsonHadoopYutaImai,HortonworksJul15,2016
![Page 2: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/2.jpg)
2 ©HortonworksInc.2011–2016.AllRightsReserved
Hortonworks会社概要
IPO4Q14(NASDAQ:HDP)
![Page 3: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/3.jpg)
3 ©HortonworksInc.2011–2016.AllRightsReserved
Hortonworks Data Platform
Hortonworks Data Platform 2.4
GOVERNANCE OPERATIONSBATCH,INTERACTIVE&REAL-TIMEDATAACCESS
YARN:DataOpera(ngSystem(ClusterResourceManagement)
Map
Redu
ce
ApacheFalcon
ApacheSqoop
ApacheFlume
ApacheKaLa
Apache
Hive
Apache
Pig
Apache
HBa
se
Apache
Accum
ulo
Apache
Solr
Apache
Spark
Apache
Storm
1 • • • • • • • • • • •
• • • • • • • • • • • •
HDFS(HadoopDistributedFileSystem)
ApacheAmbari
ApacheZooKeeper
ApacheOozie
DeploymentChoiceLinux Windows On-premises Cloud
ApacheAtlas
Cloudbreak
SECURITY
ApacheRanger
ApacheKnox
ApacheAtlas
HDFSEncrypXon
ISV
Engine
s
Apache
Spark
![Page 4: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/4.jpg)
4 ©HortonworksInc.2011–2016.AllRightsReserved
Hortonworks Data Platformpowered by Apache Hadoop
Hortonworks Data Platformpowered by Apache Hadoop
EnrichContext
Store Data and Metadata
Internetof Anything
Hortonworks DataFlow powered by Apache NiFi
動的・鮮度が重要なインサイト
静的・過去データによるインサイト
Hortonworks DataFlowによるHadoopの可能性の拡⼤
Hortonworks DataFlowとHortonworks Data Platformにより、ビックデータ基盤のエンド・ツー・エンドソリューションを提供します。
Connected Data Platform
![Page 5: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/5.jpg)
5 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Communityへの貢献
1,500を超えるエコシステムパートナーHortonworks テクノロジーパートナー
専⾨家集団: 開発に深く携わるコア・メンバーにより構成
à コミッターの多くがHortonworksの社員です。Apache Hadoop プロジェクトに関わるコミッターの約1/3はHortonworksの社員です。Apache NiFiの⼤半を始めとする多くの重要なプロジェクトに関わっています。
à コミッターはコネクティッド・データプラットフォームを改良し、⾰新を続けています。
à Hadoopのロードマップに関わっています。 コミュニティに対し、重要なリクワイアメントを⾔える⽴場にいます。
Hortonworks はApache Communityに⾮常に深く関与しています。
![Page 6: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/6.jpg)
6 ©HortonworksInc.2011–2016.AllRightsReserved
HDP Enterprise and Enterprise Plus サブスクリプション
Enterprise EnterprisePlus
ApacheHadoop&YARN ✔ ✔
ApacheTez ✔ ✔
ApacheHive ✔ ✔
ApachePig ✔ ✔
ApacheSqoop ✔ ✔
ApacheFlume ✔ ✔
ApacheMahout ✔ ✔
ApacheAmbari ✔ ✔
ApacheOozie ✔ ✔
ApacheFalcon ✔ ✔
ApacheKnox ✔ ✔
ApacheHBase ✔ ✔
ApacheAccumulo ✔
ApacheStorm ✔
HDPAdvancedSecurity ✔
ApacheSolr Separate Separate
Enterprise EnterprisePlus
ApacheHadoop&YARN ✔ ✔
ApacheAmbari ✔ ✔
ApacheFalcon ✔ ✔
ApacheFlume ✔ ✔
ApacheHBase ✔ ✔
ApacheHive ✔ ✔
ApacheKnox ✔ ✔
ApacheMahout ✔ ✔
ApacheOozie ✔ ✔
ApachePhoenix ✔ ✔
ApachePig ✔ ✔
ApacheSqoop ✔ ✔
ApacheTez ✔ ✔
ApacheZookeeper ✔ ✔
ApacheAccumulo ✔
ApacheKaLa ✔
ApacheRanger ✔
ApacheSpark ✔
ApacheStorm ✔
ApacheSolr Separate Separate
HDP サブスクリプリョンに含まれる内容
• 24x7, 365⽇/年のグローバルサポート• Web 及び電話によるサポート(⽇本語窓⼝あり)• バグフィックスや、エンハンスメントのリクエストが可能• アップグレード、アップデート、パッチへのアクセス権• HDP旧リリースバージョンへの複数年におけるZ-Stream メンテナン
ス• カスタマーサポートポータル、ナレッジベースへのアクセス• WEBベースセルフラーニングHortonworks利⽤権• 以下のリモートトラブル対応及び解析⽀援:
• 設定に関する問い合わせ、クラスタマネジメント• パフォーマンス問題• データロード、プロセス、クエリー問題
• アプリ開発の質問についてのリモートアドバイス• 半年ごとのチェックポイントレビュー (ent plusでは四半期毎)
![Page 7: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/7.jpg)
7 ©HortonworksInc.2011–2016.AllRightsReserved
HDPサブスクリプションサービスのValueとは-アプリケーション開発の問い合わせ-分析、ユースケースの追加の相談例 -こんな分析をしたいが、どのようなデータをとればよいか -やりたいことを実現するためにどのようなコンポーネントを揃えればよいか-機械学習による提案型システムヘルスチェックサービスSmart Senseの提供
Hadoopの開発エンジニアを多数抱えるホートンワークスだから⾃信を持ってお届けできるサービス。内製化にも対応。Hadoopエンジニア、ディヴェロッパー、コミッターを抱えるユーザ企業もサブスクリプションサポートを有効活⽤。
![Page 8: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/8.jpg)
8 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaOLAPop(onsonHadoop
Overview
ApacheKylin
ApacheDruid
SoluXonArchitecture
Wrapup
![Page 9: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/9.jpg)
9 ©HortonworksInc.2011–2016.AllRightsReserved
SQL evolution on HadoopCa
pabi
litie
s
Batch SQL OLAP / CubeInteractive SQL
Sub-Second SQL
ACID / MERGE
Speed Feature
Hive0.x(MapReduce)
Hive1.2-(Tez, Vectorize, ORC, CBO)
Hive2.0(LLAP)
PrestoImpala
Drill
Spark SQLHAWQ
MPP
KylinDruid
CommercialKyvos Insights
AtScaleSource
Hive(WIP)
![Page 10: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/10.jpg)
10 ©HortonworksInc.2011–2016.AllRightsReserved
SQL evolution on HadoopCa
pabi
litie
s
Batch SQL OLAP / CubeInteractive SQL
Sub-Second SQL
ACID / MERGE
Speed Feature
Hive0.x(MapReduce)
Hive1.2-(Tez, Vectorize, ORC, CBO)
Hive2.0(LLAP)
PrestoImpala
Drill
Spark SQLHAWQ
MPP
KylinDruid
CommercialKyvos Insights
AtScaleSource
Hive(WIP)
![Page 11: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/11.jpg)
11 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaOLAPop(onsonHadoop
Overview
ApacheKylin
ApacheDruid
SoluXonArchitecture
Wrapup
![Page 12: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/12.jpg)
12 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin
à eBayで開発されたOLAPエンジン。à 2014年10⽉にオープンソース化à 2015年にApacheのTop Level Projectに昇格。à 読み⽅
– きりん– かいりん– ちりん
![Page 13: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/13.jpg)
13 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Motivation
à eBayではもともとHiveでOLAP的なことをしていたが、速度に満⾜できなかった。
à ⼀般的なOLAPクエリに対して数秒〜10秒前後でレスポンスが返ってくることが求められていた。
à 当時、オープンソースのソフトウェアでその要求を満たすものがなかった
![Page 14: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/14.jpg)
14 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Architecture
RESTAPI
QueryEngine
Router
CubeBuilderHive HBase
Metadata Cube
RESTAPI JDBC/ODBC
3rdPartyApp BITools
![Page 15: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/15.jpg)
15 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Interface
Queriesà ANSI SQLà No direct cube exposure / Just through Hive metastoreà No MDX
![Page 16: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/16.jpg)
16 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Cube Designer
à Apache Kylin does not provide build scheduler
![Page 17: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/17.jpg)
17 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Partial Cubeà Balance Between Space and Time
A,B,C,D
A,B,C A,B,D A,C,D B,C,D
A,B B,C B,D A,C C,D A,D
A B C D
![Page 18: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/18.jpg)
18 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Cube vs. HBase schema
Pre-Joined/AggregatedTable
Dimensions D1
D2
D3
D4
Measures M1
M2
M3
M4
CuboidID D1 D2 D3 D4
M1 M2 M3 M4
ROWKEY
ROWVALUE
![Page 19: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/19.jpg)
19 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Incremental cube build
à Incremental build
Y-2011-2012 M-2013-1-8 D-2013-09-1-20 D-2013-09-21
• Minutes micro cubes• Kafka source• in-memory cubing• Auto merge
![Page 20: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/20.jpg)
20 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Streaming cubing(WIP)
Stream
CubeCube
Before Last Hour
HybridStorageInterface
QueryEngine
ANSI SQ
L
InvertedIndex
InvertedIndex
Last Hour
![Page 21: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/21.jpg)
21 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Who is using?
à eBay– 90% queries < 5 seconds
• User Session Analysis: 26TB, 28+ billion rows• Traffic Analysis: 21TB, 20+ billion rows• Behavior Analysis: 560GB, 1.2+ billion rows
à Baidu– Baidu Map internal analysis
à Many other Proof of Concepts– Huawei, Boomberg, Law, British GAS, JD.com, Microsoft, StubHub, Tableau…
![Page 22: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/22.jpg)
22 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Kylin – Support
à Kyligence
![Page 23: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/23.jpg)
23 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaOLAPop(onsonHadoop
Overview
ApacheKylin
ApacheDruid
SoluXonArchitecture
Wrapup
![Page 24: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/24.jpg)
24 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid
à MetaMarketで開発されたOLAPエンジン。à 2012年10⽉にオープンソース化。この時点ではGPLライセンス。à 2015年にApache License 2.0に。à 150⼈以上のコントリビューター。à 名前は、RPGのクラス
![Page 25: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/25.jpg)
25 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid - Concept
à 列志向の分散データストアà sub-secondでのクエリレスポンスà Realtime streaming ingestionà ⾃在なスライシングとダイシングà ⾃動で⾏われるデータサマライズà 概算を計算するアルゴリズムも利⽤(hyperloglog, theta)à ペタバイトスケールà ⾼い可⽤性
![Page 26: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/26.jpg)
26 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid– Motivation
à 広告配信サービスのリアルタイムダッシュボードà ⼤量のトランザクションデータを⾼い速度で投⼊し、探索可能にした
かった。à append heavyà low latencyà multi-tenantà highly availableà real-time alerting, actionable
![Page 27: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/27.jpg)
27 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid– Solutions Evaluated
à RDBMS– Star schema with aggregate tables– Slow performance on large scale (upto 20 sec page load times)– Query caching helped, arbitrary queries still slow
à Key/Value stores(Hbase, Cassandra, BigTable)– Pre-aggregate all dimensional combinations– Fast queries were achieved– Precomputation scales exponentially– Takes time to precompute(9hrs with 14 dimensions)– Not cost effective
![Page 28: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/28.jpg)
28 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Architecture in early days
Ingest
HistoricalNode
HistoricalNode
HistoricalNode
BrokerNode
BatchData
![Page 29: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/29.jpg)
29 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Historical Node
DeepStorage Disk
Memory
Segment Segment
Segment
à Shared Nothingà Main workhorsers of druid clusterà Load immutable read optimized
segmentsà Respond to queriesà Use memory mapped files to load
segmengs
![Page 30: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/30.jpg)
30 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Current architecture
Ingest
HistoricalNode
HistoricalNode
HistoricalNode
BrokerNode
BatchData
RealXmeNode
RealXmeNode
KaLaBatchData
DeepStorage
![Page 31: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/31.jpg)
31 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Current architecture
![Page 32: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/32.jpg)
32 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Broker Node
à Keeps track of segment announcements in clusterà Scatters query across historical and realtime nodesà Merge results from different query nodesà Caching layer
![Page 33: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/33.jpg)
33 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Coordinator Node
à Assign segments to historical nodesà Interval based cost function to distribute segmentsà Make sure query load is uniform across historical nodesà Handles replication of dataà Configurable rules to load/drop data
![Page 34: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/34.jpg)
34 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Interface
Queriesà REST APIà SQL(Community effort, No ANSI compliance yet)
![Page 35: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/35.jpg)
35 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Pivot
à CubedesignandvisualizaXon
![Page 36: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/36.jpg)
36 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid– Building cube
à Druid is totally ʻtime-basedʼ OLAP data store.à Basically it builds cube based on ʻtime x dimensions .̓à But how about Unique?
– Theta Sketches KMV: Open source by Yahoo!– Predictable approximation error can be trade-off by sketch size
• k=4096: RSE of +/-3.2% -> 32768bytes• k=16K: RSE of +/-1.6% -> 131072bytes
– Mergeable at query time• ʻmerge rate of about 14.5 million skethces per second per
processor threadʼ– Intersection can be computed at query time– Duplication insensitive
![Page 37: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/37.jpg)
37 ©HortonworksInc.2011–2016.AllRightsReserved
Apache Druid – Who is using?
![Page 38: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/38.jpg)
38 ©HortonworksInc.2011–2016.AllRightsReserved
Data mart architecture in Yahoo, Inc
HourETL
EventData
DailyRollup Aggregate
ETLData
Aggregate
Druid HDFS
User Interface
1x 24x ?x
![Page 39: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/39.jpg)
39 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaOLAPop(onsonHadoop
Overview
ApacheKylin
ApacheDruid
SoluXonArchitecture
Wrapup
![Page 40: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/40.jpg)
40 ©HortonworksInc.2011–2016.AllRightsReserved
Solution Architecture for SQL analysis
OLAP
HDFS Hive
Kafka
OLAP Access
Row Level Access
Data Source Source of truth Application
![Page 41: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/41.jpg)
41 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaScalableDataWarehousingonHadoop
Overview
ApacheKylin
ApacheDruid
SoluXonArchitecture
Wrapup
![Page 42: OLAP options on Hadoop](https://reader034.fdocument.pub/reader034/viewer/2022042907/587c0ef91a28ab03768b6323/html5/thumbnails/42.jpg)
42 ©HortonworksInc.2011–2016.AllRightsReserved
SQL evolution on HadoopCa
pabi
litie
s
Batch SQL OLAP / CubeInteractive SQL
Sub-Second SQL
ACID / MERGE
Speed Feature
Hive0.x(MapReduce)
Hive1.2-(Tez, Vectorize, ORC, CBO)
Hive2.0(LLAP)
PrestoImpala
Drill
Spark SQLHAWQ
MPP
KylinDruid
CommercialKyvos Insights
AtScaleSource
Hive(WIP)