구글의 데이터 파이프 라인 Dataflow 락플레이스 이준호 SA

Post on 22-Jan-2018

97 views 1 download

Transcript of 구글의 데이터 파이프 라인 Dataflow 락플레이스 이준호 SA

Google�Cloud�Dataflow구글의데이터파이프라인 Dataflow

Junho Lee�(이준호)

jhlee@rockplace.co.kr

$(whoami)

• Junho Lee�(이준호)�/�jhlee@rockplace.co.kr• Rockplace Inc.�(2014�~�)• Solutions�Architect�!• Google�Cloud�Platform�Authorized�Trainer�"

- Google�Cloud�Platform�Fully�Qualified�Developer- Google�Certified�Professional�­ Cloud�Architect- Google�Certified�Professional�­ Data�Engineer- Google�Certified�Associate�­ G�Suite�Administrator

What�is�the�Problem?

Infinitely�unbounded�data�stream�with�unknown�delay

Google�“Cloud�Dataflow”

• Cloud�Dataflow�is�a�fully-managed�service�for�transforming�and�enriching�data�in stream (real�time)�and�batch�(historical)�modes�with�equal�reliability�and�expressiveness.

• 구글의 “Dataflow”�모델의 구현

https://cloud.google.com/dataflow/

The�“Dataflow�Model”�by

MapReduce•Large�Scale�Data�Processing

FlumeJava•Java�library�for�data-parallel�pipelines

MilWheel•Fault-Tolerant�Stream�Processing�Framework

History�of�Apache�Beam

Dataflow�Model�➜ Beam�Model

• Google�donated�at�2016• A�unified programming�model�designed�to�provide�efficient�and�portable�data�processing�pipelines

• Multiple�Runners:• Apache�Apex• Apache�Flink• Apache�Spark• Apache�Gearpump• Google�Cloud�Dataflow …

Run�Everywhere

Cloud�Dataflow�vs�On-premise

No-Ops

https://www.safaribooksonline.com/library/view/hadoop-essentials/9781784396688/ch02s05.html

Worker�Lifecycle�Management

Dynamic�Worker�Scaling

Monitoring�UI

Centralized�Logging

Cloud�Dataflow�in�Bigdata�Lifecycle

CloudDatalab

DEMOhttps://beam.apache.org/get-started/mobile-gaming-example/

Demo:�Game�­ UserScore.py

Q&A

One�more�thing�­ Cloud�Dataprep