BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 ›...
Transcript of BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 ›...
![Page 1: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/1.jpg)
丁来强 [email protected] 1
![Page 5: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/5.jpg)
Agenda
• Background• Definition• RoleinDataInfra
• Requirement• Problem• Challenges• Requirement
• Solutions• Overview• Luigi• Airflow
• Demo
5
![Page 6: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/6.jpg)
Youwilllearn:
• Roleofworkflowschedulerfordataengineeringinecosystem.• Challengesandkeyrequirements.• Solutionsandgeneraldifferences.• Architecture,designandpracticesofusingAirflowandLuigiinPython• Pitfallsandcommonpatternsindesigntouseaworkflowscheduler
6
![Page 8: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/8.jpg)
Definition
BigDataWorkflowScheduler
Scheduleandmanagedependenciesofworkflowofjobsindatainfrastructure,mainlyusedinofflineandnear-linesystem.
8
![Page 9: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/9.jpg)
BigDataWork-flowScheduler
Work-flow&Dependency
Jobs&Tasks
BigDataSystems
Scheduler
9
![Page 10: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/10.jpg)
Differentwithbelowcategories:
• BPM• LikeActiviti
• Middlewareworkflow&SOA• LikeAWSSimpleWorkflow
• PureDataDrivenPipeline/APIforDevelopment• LikeApacheCrunch,ApacheCascading,AWSDataPipeline,AzureDataFactory
• Pure StreamingProcess• LikeStorm,SparkStreaming
10
![Page 15: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/15.jpg)
LinkedInDataInfra
http://www.slideshare.net/amywtang/linkedin-bigdata-yaleoct2012final15
![Page 16: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/16.jpg)
LinkedInDataInfra
http://www.slideshare.net/amywtang/linkedin-bigdata-yaleoct2012final16
![Page 17: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/17.jpg)
DataofworkflowschedulerinBigData
• 14boxes dedicatedforwork-flowsystem• 8,000tasksdaily
•Maintain3instancesofwork-flowsystem• 2,500flows,30,000jobsdaily
• 2000+tasks,10,000+Hadoopjobsdaily
Airflow
17
![Page 26: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/26.jpg)
Fragileprocess
JobA
JobB
JobC
JobX
PushtoProduction PushtoQAA/BTesting
AlertWhenfailure
26
![Page 29: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/29.jpg)
丁来强 [email protected] 29
2. Jobfailsduetosystemornetworkmaynotbetemporarilynotavailable
1. Scheduledtriggersareskippedduetounavailabilityofsub-system
3. Someerrorsorbugsmayexistinsomejobs’logic
4. Performanceisslowespeciallyforsomecriticalsteps
29
![Page 31: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/31.jpg)
BasicNeeds
FailureTolerance&Backfill
CalendarBasedScheduling
LogAccess&Monitoring Notification
Work-flow&Dependency JobDefinition
31
![Page 32: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/32.jpg)
AdvancedNeeds
Scalability SLAMonitor&Alert
ComplexRule OperatorOOB
HighAvailability
Programmatic
32
![Page 33: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/33.jpg)
AdvancedNeeds(cont’)
Queue(Affinity)
P O O L
Pool(Limitconcurrency+priority)
DataProfiling
Plugins
Versioning
EventDrivenScheduler
33
![Page 36: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/36.jpg)
SolutionOverview
Basic Info Luigi Airflow Azkaban OozieLanguage Python Python Java JavaGithub Stars 5,274 3,422 780 354Contributors 256 178 37 18LatestVersion 2.3.1 1.7.1 3.1 4.2History 4years 1+years 6+years 6+yearsInvented by Spotify Airbnb LinkedIn YahooOwned by Spotify Apache
IncubatorApache Apache
36
![Page 37: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/37.jpg)
Azkaban
• Pros:• BornforHadoop• SupportallHadoop,hive,pigversions
• EasytouseWebUI:• GoodJobvisualizationandmonitoring
• FlexibleModulestructure/Plugins• Cons:• Properties files based configuration• WebUIonly,NoCLIandRESTinterfaces(need3rd partyAzkabanCLI)• Limitedexecutionpathcontrol
37
![Page 39: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/39.jpg)
Oozie
• Pros:• BornforHadoop• CLI,HTTP,JAVAAPIinterfaces• SupportextendedAlertintegration
• Cons:• Higherlearningcurve• PDLstyleXML basedconfiguration• LimitedWebUI(needClouderaHue)• Noresourcecontrol
39
![Page 41: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/41.jpg)
Overview
• Pros:• ProgrammaticbyPython• Modelingissimple,Codeismature(~20KLOC)• GoodsupportHadoop(MR,logs,dist)• Testfriendly,supportlocalscheduler
• Cons:• WebUIisverylimited• Nobuilt-intrigger(needcron)• Notdesignforlargescaling(>100Ktasks)• Nosupportdistributionofexecution
41
![Page 42: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/42.jpg)
TaskDefinition
OutputoftheTask:ReturnoneormoreTargets
SetupDependencies:ReturnoneormoreTasks
Logic
42
![Page 46: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/46.jpg)
ArchitectureNotes
•Mainlymanagethedependencyandde-dupthetaskrunning.•MainlyfocusondatapipelineETL.• Limitations• Nocalendartrigger• WebUIisverylimited• Toocouplebetweenworkerandscheduler(notsupport>100Ktasks)• Executionisbundledonspecificworker
46
![Page 49: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/49.jpg)
TaskandTargetsLibrary
• GoogleBigquery• Hadoopjobs• Hivequeries• Pigqueries• Scaldingjobs• Sparkjobs• Postgresql,Redshift,Mysql tables• andmore…
49
![Page 51: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/51.jpg)
Overview
• Pros(wewillsee):• MoreGeneralFlexibleArchitecture• VerycompellingWebUI• LotsofcoolfeaturesOOB,RichOperatorlibrary• Fastgrowingadoption(30+companies)• Testfriendly(testmodeandSequentialScheduler)
• Cons:• Codingqualityisnotsomature(UTcoverageisnothigh)• Noeventdrivenscheduler(sametoallotherssolutions)
51
![Page 52: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/52.jpg)
AirflowTechStack
• PythonCode(<20KLOC)• DB:SqlAlchemy• Celeryfordistributedexecution• WebServer:Flask/gunicorn• UI:d3.js/Highcharts /Pandas• Templating:Jinjia2
52
![Page 61: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/61.jpg)
DAG(DirectedAcyclicGraph)
DAG:acollectionoftasksw/schedulingsettings
Task:aninstanceofBashOperatorSupporttemplating
Setupthedependencies
AntaskofanotherkindofPythonOperator
61
![Page 62: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/62.jpg)
DAGexecution
Dag1Run(2016-9-1)Dag1
Task1 Task3
Task2
Dag1Run(2016-9-2)
Task1Instance(2016-9-1)
Task3Instance(2016-9-1)
Task1Instance(2016-9-1)
HiveOperator
PigOperator
PythonOperatorHiveHook
PigHook
Dag1Run(2016-9-3)
62
![Page 63: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/63.jpg)
Concepts– DAG,DAGRun
•DAG• AcollectionofTasks• SettingofCalendarScheduling
•DagRun• AruninstanceofDAGwithascheduleddate(ID:dag,starttimeandinterval)
63
![Page 64: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/64.jpg)
Concepts– Operator,TaskandTI
•Operator• Tasktemplates
• Task• InstanceofaOperator
• TaskInstance(TI)• BelongtoDagRun• AruninstanceofaTaskwithascheduleddate(id:dag,task,starttimeandinterval)
64
![Page 65: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/65.jpg)
Concepts- Operator
•Operator• Tasktemplates,generalcategories:• Sensor• Branching• Transformer
• SettingsofTriggerRules,retryetc.• UseHookforrealoperationw/externalsystems
65
![Page 66: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/66.jpg)
OperatorLibrary
• GoogleBigquery,CouldStorage• AWSS3,EMR• SparkSQL• Docker• Presto• Sqoop• Hivejobs• Vertica• Qubole• SSH• Hipchat,Slack,Email• Postgresql,Redshift,Mysql,Oracleetc.• andmore…
66
![Page 67: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/67.jpg)
ParameterizedTasks
• Variables• Globalparameters
• Connections• Externalsystem’sconnectionstring,confidential,extraparametersetc.NormallyusedbyHook.
• DAGParameters/Macros• Templating• UsingJinjia forbatchoranyplacesthatfit
• Xcom• SharedatabetweenTasks
67
![Page 69: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/69.jpg)
AirflowArchitecture(LocalScheduler)
Scheduler
Hive
HDFS
MySQL
Cascading
Spark
Presto
…
MetadataDatabase
WorkersWorkersWorkersWorkersWorkersWorkers
WebServersWebServersWebServers
69
Invoke
![Page 70: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/70.jpg)
LocalScheduler– w/versioncontrol
MasterRepo
CodeRepoScheduler
WorkersWorkersWorkersWorkersWorkersWorkers
WebServersWebServersWebServersHive
HDFS
MySQL
Cascading
Spark
Presto
…
MetadataDatabase
70
![Page 71: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/71.jpg)
AirflowArchitecture(CeleryScheduler)
Scheduler
Hive
HDFS
MySQL
Cascading
Spark
Presto
…
MetadataDatabase
WorkersWorkersWorkersWorkersWorkersWorkers
Brokers(MQ)
StateStore
WebServersWebServersWebServers
71
![Page 72: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/72.jpg)
CeleryScheduler– w/versioncontrol
CodeRepo
Scheduler
WebServers
MetadataDatabase
Workers
Hive
HDFS
MySQL
Cascading
Spark
Presto
…
WorkersWorkersWorkersWorkersWorkers
WebServersWebServers
Brokers(MQ)
StateStore
MasterRepo
72
![Page 73: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/73.jpg)
AirflowArchitecture- HA
Scheduler
Hive
HDFS
MySQL
Cascading
Spark
Presto
…
MetadataDatabase
WorkersWorkersWorkersWorkersWorkersWorkers
Brokers(MQ)
StateStore
WebServersWebServersWebServers
SchedulerSlave
73
![Page 82: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/82.jpg)
Scheduler– intervalinworkflow
EveryDagRunwillonlystartwhennextDagRun’sexecution
timemeets
Note
Runat04:00:0082
![Page 83: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/83.jpg)
Scheduler– recursiverunning
Run1(09-0100:00)
Run2(09-0104:00)
Whatifoutput_file inRun1 andRun2 impacteachothers?
TrybesttoavoidthiskindofdesignIdea
83
![Page 84: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/84.jpg)
PrinciplewhendefiningTask
EachTaskshouldbeatomic- isolationfromconcurrentprocessing- Eithersucceedorfailure,nogreystate- Failurewillnotimpactthesystem
Atomic
Taskgranularityshouldbeproper- Choose“Rightsize”foronetask- Taskshouldexecutesimultaneously
Granularity
84
![Page 87: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/87.jpg)
Scheduler– recursivedependency
Run1(09-0100:00)
Run2(09-0104:00)
Whatifread_file inRun1 andRun2cannotruninparallelduetoexternalsystem’slimitation
Turnonoption“depends_on_past“fortaskread_fileOption2
Assignapoolwith1 Slotstofortaskread_fileOption1
87
![Page 89: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/89.jpg)
Scheduler– morerecursivedependency
Run1(09-0100:00)
Run2(09-0104:00)
Whatifread_file inRun2 repliesonoutput_file inRun1duetorestrictionornecessarystateful design?
Turnonoption“wait_for_downstream“fortaskread_file(Thiswillforcetoturnon“depends_on_past”)
Option
89
![Page 90: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/90.jpg)
Scheduler– recursivedependencypitfall
start_date andschedule_interval shouldbealigned
2016-09-0800:00:00isaligned2016-09-0802:00:00 isNOTaligned
ThiswillmaketheDAGfailureiftheoption“depends_on_past“isturnedon
Note
‘@once’justonetime
90
![Page 91: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/91.jpg)
Someothernotes
• Updatethedagidwhenchangingthelogicinside• UsingSLAalertforcriticaltasks
• Featureinplan:• EventDrivenScheduler• Mesos Scheduler• Moreoperators• Moresyntaxsugar
91
![Page 93: BigDataWorkflow - pic.huodongjia.compic.huodongjia.com › ganhuodocs › 2017-06-17 › 1497682292.04.pdfData of workflow scheduler in Big Data •14 boxesdedicated for work-flow](https://reader035.fdocument.pub/reader035/viewer/2022063000/5f0f66257e708231d443f774/html5/thumbnails/93.jpg)
Nowyou’velearned:
• Definitionandecosystem.• Challengesandkeyrequirements.• Solutionsandgeneralcomparisons.• MostimportantpartofAirflowandLuigi• Architecture,design,patterns,pitfallsandpracticesetc.
93