BDX 2016 - Tal sliwowicz @ taboola
-
Upload
ido-shilon -
Category
Internet
-
view
268 -
download
3
Transcript of BDX 2016 - Tal sliwowicz @ taboola
![Page 1: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/1.jpg)
Taboola’sRoadtoScaleTheDataPerspec4veTalSliwowicz
![Page 2: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/2.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
TalSliwowiczDirector,R&[email protected]
WhoamI?
![Page 3: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/3.jpg)
You’ve Seen Us Before!
Enabling people to discover information at that moment when they’re likely to engage
![Page 4: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/4.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
Entertainment | Lifestyle
Tech
Our Clients are All Around the Globe
![Page 5: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/5.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
750M monthly unique
users
100K+ Requests/sec
10B+ recommendation
s/day
5TB+ Daily data
REACH PROPERTY
95.5% Google Ad Network
87.8% Taboola 86.2% Google Sites 61.5% Facebook 60.3% Yahoo Sites 56.6% Outbrain
52% mobile traffic
48% desktop
traffic
US desktop users reached, 12/2015
TaboolainNumbers
![Page 6: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/6.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
Context
Metadata Region-based
Location
Recommendations
User Behavior
Cookie Data
Collaborative Filtering
Bucketed Consumption Groups
CONTENT RECOMMENDATION ENGINE
Social
Facebook / Twitter API
TheRecommenda4onEngine
![Page 7: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/7.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
Taboola’s Discovery Platform
Traffic Acquisition
Business Dev.!
Sponsored ContentEditorial!
Newsroom
Sales!Native Ads
Audience Dev. Product!
Personalization
Data & Insights!
![Page 8: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/8.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Eventsandlogs(rawdata)wriPendirectlytoDB
• RecsArereadfromDB
• CrashedwhenCNNlaunched
Taboola2007
Frontend
FEServer
![Page 9: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/9.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Sameasbefore,butwithoutdirectwritetoDB
• Switchingtobulkload• But–VeryBasicRepor4ng,notscalable
Taboola2007.5
Frontend
BulkLoad
FEServer
![Page 10: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/10.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Introducedasemireal4meeventsparsingservices:SessionParserandSessionAnalyzer
• Dividedanalysisworkbyunit(session)
• FileswerepushedfromRecServer(s)toBackendprocessing
• FilesaregziptextualINSERTstatements
• But–notreal4meenough
Taboola2008
Frontend
NFS
Backend
FEServer SessionParser SessionAnalyzer
WriteSummarizedData
WriterawdataReadsessionfiles
Readrawdata
Writesessionfiles
![Page 11: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/11.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Madealeaptowardsreal-4mestreamprocessing
• UnifiedSessionParserandSessionAnalyzertoanin-memoryservice(withoutgoingthroughdisk)
• Madedrama4cop4miza4ontomemoryalloca4onanddatamodels
• Failuresafearchitecture-canenduredatadelays,front-endservers’malfunc4on
• NodirectDBaccess-keyforperformance,onlyusingbulkloadingforloadinghourlydata
Taboola2010
Frontend
NFS
Backend
FEServer SessionParser+Analyzer
WriteHourlyData(BulkLoading)
Writerawdata
Readrawdata
![Page 12: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/12.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Mul4DC
• Roughlysamearchitecture
• Increasingbackendgrowthbyscalingin(monstermachines)
• Introducedreal-4meanalyzers
• Introducedsharding
• Movedtolsyncbasedfilesync
• IntroducedTopReportscapabili4es
Taboola2011-2013
Frontend
Lsync
Backend
FEServer SessionParser+Analyzer
WriteHourlyData(BulkLoading)
Writerawdata
Readrawdata
![Page 13: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/13.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
Taboola2014-
![Page 14: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/14.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Lotsofincomingtraffic(100Krequests/sec)• Data(5+TB/day):
• Personalizedservedrecommenda4ons–peruser,perpageview• Events-Whattheuseractuallyreadandwhathedid
• Thedataneedstobejoinedandprocessedinreal4me• CampaignsManagement• Recommenda4ons• Billing• Reports• Etc.
• Thedataneedstobeavailableforofflineresearch
OurDataRequirements
![Page 15: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/15.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
DataModelUsers
Sessions
Views
Requests
Items
Events
![Page 16: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/16.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Wecareaboutsessions-chainofpageviewsandeventsforaspecificuser
• Lengthcanbehoursorevendays• Wecareaboutusers–chainofsessionsacrosssites
• Lengthcanbedaysorevenmonths• StatelessApplica4on–singleuserdataissentfrommul4pledatacentersandmul4pleservers
• Nodeterminis4caffinitytoaserverorDC• Orderisn’tguaranteed• Mustberobustandautoma4callydealwithlatearrivals• “Exactlyonce”seman4cs
Challenges
![Page 17: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/17.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Manystreamsofdatathatneedtobejoined(user,session,pageview,widgets,recommenda4ons,events,ac4ons)
• 5+TBofdailydata• Researchpurposesrequirelookingatfulluserac4vityacross4me
ChallengesCont.
![Page 18: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/18.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
DataFlow
FEServers
Kana
FEConsumer(Spark)
C*Sessions
![Page 19: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/19.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Par44onkey-sessionstarthour+userbucket(0-9,999)• Clusteringkey-publisher_id,user_id,session_id,view_id,data_type,data_hash
• DataType-MULTI_REQUEST,USER_EVENT,ACTION_CONVERSION,…• Data-blobsofprotobuff
• Results:• Allthedataofasinglesessionisinoneplace,regardlessof4meofarrival• Idempotentprocess-ifsamemessageisreceivedtwiceitoverrunsthe
previousarrivalsduetosamehashid• Samplingisbuilt-intothemodel
TableModelinC*
![Page 20: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/20.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
TrafficProcessor(Spark)
Manualrunner
NextGen.Reports
NextGen.Counters(Spark)
Zeppelin BIgQuery
DataFlowCont.
C*Sessions
Hadoop Ver4ca
![Page 21: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/21.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Rawdata–real4mefullaccesstotherawdata,notjustaggregateddata
• Weekofdata(~35TB)-2hourstoanalyzeandreport• 10physicalnodes,320Cores,2.5TBmemory,SSDs
• Analyzing1%sampleoftheusersreducesthislinearly(par44onkey)
• Analyzingasinglepublisherwhichis1%ofthedatareducesthisalmostlinearly(clusteringkey)
• Repor4ng–minutesforavailabilityoffullrepor4ngvs.hours
• Suppor4ngourgrowth–Sparkasadistributedcompu4ngengineisverystrong,easytoscaleandextend
Beforevs.Ayer
![Page 22: BDX 2016 - Tal sliwowicz @ taboola](https://reader033.fdocument.pub/reader033/viewer/2022051503/587fad221a28ab107e8b4b15/html5/thumbnails/22.jpg)
Copyrig
ht©2016
The
Nielse
nCo
mpany.Con
fiden
4aland
Proprietary.
• Longtermdataaccess–Hadoop,CassandraandBigQueryprovideasolu4onwedidnothavebefore
• Analy4csengine–themovefromMySQLtoVer4ca(asanMPPengine)allowsustosupportcomplexqueriesoververylargedatasets
• AlgorithmicResearchandModeling–wearenowcapableofindepthanalysisonmul4pledimensionsacrosslong4meperiods
Beforevs.Ayer-Cont.