Выступление Сергея Жемжицкого, CleverData
-
Upload
- -
Category
Data & Analytics
-
view
53 -
download
1
Transcript of Выступление Сергея Жемжицкого, CleverData
Company Profile Сегментация пользователей в online-рекламе
Spark vs Hadoop
Сергей Жемжицкий, CTO, CleverDATA, 22 мая, 2015
cleverdata.ru | [email protected]
International market business development since 2012
One of three leading IT companies in Russia 43 branches in Russia and abroad +5500 employees 100K projects for 10K customers
Data management innovative platform (Data Exchange Service) Cloud Service In-house development
Internet advertising solutions Data Management Platforms Customers Base Management Web Analytics Marketing automation
Big Data Data Mining Digital Intelligence Operational Intelligence Low Latency and NoSQL Cloud Computing
cleverdata.ru | [email protected]
Агенда
• Про задачу; • Hadoop vs. Spark; • Особенности; • Что дальше.
cleverdata.ru | [email protected]
publishers
AD NETWORK AD NETWORK
AD NETWORK AD NETWORK
AD NETWORK AD NETWORK
advertisers
DS P
SS P
Real Time Bidding (RTB)
TRACKING DATA
cleverdata.ru | [email protected]
publishers
COOKIE SYNCs ACCESS LOGS
PARTNER’S DATA 3rd PARTY DATA CLICK STREAMS
advertisers
SS P
DS P
DMP
Data Management Platform (DMP)
cleverdata.ru | [email protected]
3rd party data
Relational Data Store
raw data 3rd party data
3rd party data
Raw Data Store & Processing
RealTime Data Store
user profiles aggregates
Типовые потоки данных
cleverdata.ru | [email protected]
Типовые потоки данных :: RTB
3rd party data
Relational Data Store
RTB
SRV
Exchange SSP
bid req. bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data 3rd party data
3rd party data
Raw Data Store & Processing
RealTime Data Store
user profiles aggregates
cleverdata.ru | [email protected]
1st-party data
3rd party data
Relational Data Store
RTB
SRV
Exchange SSP
bid req. bid resp.
pixels :: impressions :: clicks
bid requests
user profiles
raw data 3rd party data
3rd party data
Raw Data Store & Processing
RealTime Data Store
user profiles aggregates
cleverdata.ru | [email protected]
1st-party data
• Зачем монетизировать?
• Как монетизировать?
• Чем монетизировать?
cleverdata.ru | [email protected]
Зачем монетизировать?
Найти всех пользователей, которые участвовали в рекламной кампании “Star Wars” [и] видели один из баннеров “Darth Vader” или “Luke Skywalker”
в течении последних 6 дней [и] кликнули на этот баннер [и] посетили страницу покупки светового меча Darth’а Vader’а [и] но так ничего и не купили
Для того, чтобы сделать ретаргетинг персонифицированным баннером со скидкой на меч в 40%
cleverdata.ru | [email protected]
find all users who have taken part in campaign[s] “Star Wars” [and] viewed banner[s] “Darth Vader” or “Luke Skywalker”
during [last] 6 day[s] [and] clicked banner[s] “Darth Vader's lightsaber” [and] visited buying area of “Darth Vader's lightsaber” [and] not visited order confirmed area of “Darth Vader's lightsaber”
Как монетизировать?
[impression]
[click] [tr. pixel] [tr. pixel]
id cookie event_id event_type campaign_id timestamp …
1 c1 “Darth Vader” impression “Star Wars” 2015-04-20 14:25:11.462 … 2 c1 “Darth Vader's lightsaber” click “Star Wars” 2015-04-21 06:31:12.157 … 3 c1 “Darth Vader's lightsaber” tr. pixel “Star Wars” 2015-04-22 18:57:19.628 …
[cookies]
cleverdata.ru | [email protected]
Как монетизировать?
reduce find all users who have
taken part in campaign[s] “Star Wars”
viewed banner[s] “Darth Vader” or “Luke Skywalker” during [last] 6 day[s]
clicked banner[s] “Darth Vader's lightsaber”
visited buying area of “Darth Vader's lightsaber”
not visited order confirmed area of “Darth Vader's lightsaber”
(c1, 0)
(c1, 1)
(c1, 2)
(c1, 3)
Ø
map
(c1, 0;1;2;3)
true(0) and true(1) and true(2) and true(3) and not false(4)
C1
cleverdata.ru | [email protected]
VS.
cleverdata.ru | [email protected]
Spark :: Размер
cleverdata.ru | [email protected]
Перед тем, как смотреть на Hadoop
cleverdata.ru | [email protected]
Map-Reduce :: Размер
cleverdata.ru | [email protected]
Материалы и инструменты
Hardware (3 Nodes) • 12 Core AMD Opteron™ 6338P
~ 2.8 GHz • 64 GB RAM • 1 GBPS NICs
Software • CDH 5.3.1 (Hadoop 2.5.0) • Spark 1.2.0
Data • 14.2 GB of raw data • 61.1 M of transactions • 128 MB block size
cleverdata.ru | [email protected]
MR vs Spark :: Время выполнения
cleverdata.ru | [email protected]
Spark :: Exec-cores vs Num-execs
cleverdata.ru | [email protected]
MR vs Spark :: Инициализация
MR
9 protected void setup(Context ctx) 9 o.a.h.c.Configured 9 distributed cache
Spark
9 mapRegion 9 broadcast vars
cleverdata.ru | [email protected]
MR vs Spark :: Параллелизм
MR
9 mapred.reduce.tasks 9 mapreduce.job.reduces 9 splittable formats
Spark
9 spark.default.parallelism 9 num-executors, executor-cores in
yarn 9 numTasks в groupByKey,
reduceByKey, aggregateByKey…
cleverdata.ru | [email protected]
MR vs Spark :: Зависимости
MR
9 o.a.h.u.Tool 9 o.a.h.u.ToolRunner 9 -conf app.conf 9 -files 9 -libjars 9 setUserClassesTakesPrecedence
Spark
9 --jars 9 --files 9 --conf 9 --driver-java-options 9 spark.driver.extraJavaOptions 9 spark.executor.extraJavaOptions 9 spark.driver.userClassPathFirst 9 spark.executor.userClassPathFirst
cleverdata.ru | [email protected]
MR vs Spark :: Secondary Sort
MR
9 setSortComparatorClass 9 setGroupingComparatorClass 9 setPartitionerClass
Spark
9 repartitionAndSortWithinPartitions 9 mapPartitions 9 Entire partition processing result
must be able to fit in memory
cleverdata.ru | [email protected]
MR vs Spark :: Тестирование
MR
9 MRUnit 9 o.a.h.h.MiniDFSCluster 9 o.a.h.m.MiniMRCluster 9 o.a.h.y.s.MiniYARNCluster 9 o.a.h.m.v2.MiniMRYarnCluster
Spark
9 Local executor
cleverdata.ru | [email protected]
Что дальше и почему Spark?
• Spark Streaming;
• Micro Batches;
• λ-архитектура.
без серьезного хирургического вмешательства
cleverdata.ru | [email protected]
Спасибо за вопросы!
[email protected] :: [email protected]
cleverleaf.co.uk :: cleverdata.ru
1dmp.io :: crawler.1dmp.io
facebook.com/CleverData :: +7 (495) 967-66-50