구글의 데이터 파이프 라인 Dataflow 락플레이스 이준호 SA
-
Upload
rockplace -
Category
Data & Analytics
-
view
97 -
download
1
Transcript of 구글의 데이터 파이프 라인 Dataflow 락플레이스 이준호 SA
$(whoami)
• Junho Lee�(이준호)�/�[email protected]• Rockplace Inc.�(2014�~�)• Solutions�Architect�!• Google�Cloud�Platform�Authorized�Trainer�"
- Google�Cloud�Platform�Fully�Qualified�Developer- Google�Certified�Professional� Cloud�Architect- Google�Certified�Professional� Data�Engineer- Google�Certified�Associate� G�Suite�Administrator
What�is�the�Problem?
Infinitely�unbounded�data�stream�with�unknown�delay
Google�“Cloud�Dataflow”
• Cloud�Dataflow�is�a�fully-managed�service�for�transforming�and�enriching�data�in stream (real�time)�and�batch�(historical)�modes�with�equal�reliability�and�expressiveness.
• 구글의 “Dataflow”�모델의 구현
https://cloud.google.com/dataflow/
The�“Dataflow�Model”�by
MapReduce•Large�Scale�Data�Processing
FlumeJava•Java�library�for�data-parallel�pipelines
MilWheel•Fault-Tolerant�Stream�Processing�Framework
History�of�Apache�Beam
Dataflow�Model�➜ Beam�Model
• Google�donated�at�2016• A�unified programming�model�designed�to�provide�efficient�and�portable�data�processing�pipelines
• Multiple�Runners:• Apache�Apex• Apache�Flink• Apache�Spark• Apache�Gearpump• Google�Cloud�Dataflow …
Run�Everywhere
Cloud�Dataflow�vs�On-premise
No-Ops
https://www.safaribooksonline.com/library/view/hadoop-essentials/9781784396688/ch02s05.html
Worker�Lifecycle�Management
Dynamic�Worker�Scaling
Monitoring�UI
Centralized�Logging
Cloud�Dataflow�in�Bigdata�Lifecycle
CloudDatalab
DEMOhttps://beam.apache.org/get-started/mobile-gaming-example/
Demo:�Game� UserScore.py
Q&A
One�more�thing� Cloud�Dataprep