A brief history of time with Apache Flink.pdf

26
A Brief History of Time with Apache Flink Real time monitoring and analysis with Flink, Kafka and HBase Flink Forward 2016 Berlin, September, 12th, 2016 1 Thomas LAMIRAULT Mohamed Amine ABDESSEMED

Transcript of A brief history of time with Apache Flink.pdf

Page 1: A brief history of time with Apache Flink.pdf

A Brief History of Time with

Apache Flink

Real time monitoring and analysis with Flink,

Kafka and HBase

Flink Forward 2016Berlin, September, 12th, 2016

1

Thomas LAMIRAULT

Mohamed Amine ABDESSEMED

Page 2: A brief history of time with Apache Flink.pdf

The speakers

• Software engineer & solution

architect @ Bouygues

Telecom since 2013

• A Flinker since the early

beginnings

@AminouvicTweets

2

Mohamed Amine

ABDESSEMED

• Bigdata Software engineer

@ Bouygues Telecom

since 2015

• A Flink master enthusiast

@thomaslamirault

Thomas

LAMIRAULT

Page 3: A brief history of time with Apache Flink.pdf

Outline

• Who is Bouygues Telecom

• Data Value and Streaming Analytics

• Use case

• Challenges

• Streaming analytics with Flink

• Results

3

Page 4: A brief history of time with Apache Flink.pdf

15M Clients

12,1M Mobile subscriber

2,9M Fixed customer

WHO IS BOUYGUES TELECOM ?

Mobile . Fixed . TV . Internet . Cloud

4

First Android TV BOX

Leader

4G/4G+

VoLTE

UHSM

A very Innovative company

Page 5: A brief history of time with Apache Flink.pdf

WHO IS BOUYGUES TELECOM ?

5

Page 6: A brief history of time with Apache Flink.pdf

LUX: Logged User eXperienceMobile QoE

• Our Big Data platform

• Produce Mobile QoE indicators based on massive network equipment’s event logs (Billions event/day).

• Goals:– QoE (User) instead of QoS (Machine)

– Real-time Diagnostic (<60’ end-to-end latency)

– Business Intelligence

– Reporting

– Real-time alarming

6

Page 7: A brief history of time with Apache Flink.pdf

LUXIn Numbers

7

~300 users

30 Flink

productionapps

10Billions

rawevent/day

2 Po storage

~120 users

5 Flink production

apps

4 Billions raw

event/day

750 To storage

2015 Today

Page 8: A brief history of time with Apache Flink.pdf

Analytic Data value

8Important event

occurrence

Predictiveanalytics

Advanced analytics

Time

Data Value Streaming

analytics

T0 T1 T2 TyT-xBefore important event

NOWAfter important event

Page 9: A brief history of time with Apache Flink.pdf

DATA IS

MOST

VALUABLE

NOW !

9

Page 10: A brief history of time with Apache Flink.pdf

Analytic Data value

• Data is most valuable when made

available as soon as important events

occur.

• Get the most of Data

– Collect data fast.

– (Pre)Process it fast.

– Analyze it and create added value to act

faster!

10

Page 11: A brief history of time with Apache Flink.pdf

The use case

11

GNOC

Identify

Define

ExploreAction

Look backSpecific

Numbers

Emergency Numbers

Call Center

Numbers

Something wrong?

The BIG picture isn’t always significant

Global counts have no sense !

Page 12: A brief history of time with Apache Flink.pdf

The use case

• A simple and valuable use case

• Need to analyze the entire call traffic :

– Considering multiple aggregation axes

– Fine grained analysis

– Detect when something is happening

somewhere in real-time

– Compare with historical values trends

12

Page 13: A brief history of time with Apache Flink.pdf

Challenges

• Low latency & streaming fashion counters

• Quickly available KPIs = value

• Massive amounts of data + peak loads

• Reliability

• Multiple flow correlation

• Time management:

– Out of order & late events our worst enemies

– Flexible window management

– Specific watermark emission

13

Page 14: A brief history of time with Apache Flink.pdf

Streaming Analytics Time

Management

14

8h00 8h30 9h00 9h30Processing Time

Page 15: A brief history of time with Apache Flink.pdf

Streaming Analytics Time

Management

15

8h00 8h30 9h00 9h30Processing Time

8h09

8h14

8h03

8h00 8h12

8h43

8h45

8h48

9h10

8h00

9h12

8h13

8h47

9h32

9h35

9h14

9h30

8h02

8h03

Event TimeWindow

FIRE

Page 16: A brief history of time with Apache Flink.pdf

Streaming Analytics with Flink

Built-in windowing functionalities

• Custom Watermark extractor

• Custom Triggers for lateness management

• Custom Key extractor

Stateful Streaming

• Checkpointing

• Fault tolerance

• Savepoints

• Update without data loss

16

Page 17: A brief history of time with Apache Flink.pdf

Streaming analytics with Flink

Performance

• High throughput

• Low latency

• Excellent memory management

Flexible window management

• Tumbling

• Sliding

• Session

17

Page 18: A brief history of time with Apache Flink.pdf

High Level Architecture

18

Queue Streaming

ProcessingWindowing

Computation

Collect

Storage

Dataviz

In-Memory Lookups

Alarming

QueueQueue

StreamingStreaming

Page 19: A brief history of time with Apache Flink.pdf

Architectural details

19

Storage

Kaf

ka c

lust

er

T’1

P1

Px

..

T’n

P1

Px

..

T1

P1

Px

..

Tn

P1

Px

..

Tm

P1

Px

..

T’m

P1

Px

..

Analytic APP

State Backend

Historical Data

Monitoring

Alarming

Page 20: A brief history of time with Apache Flink.pdf

Streaming application Details

20

TnTnTn Multiple Input Topics

filterfilterfilter Keep only records that interest us

Extract timestampExtract relevant event timestamp, with custom watermark emission

keyByGroup by considered aggregation axes

window Tumbling windows

triggerCustom trigger, allow configurable late data and manage lagged data

sink Write Data into HBase

reduce Produce KPIs

UnionUnion Correlate input flows

Page 21: A brief history of time with Apache Flink.pdf

The resultsProduction metrics

• Low latency (<100ms)

• Input : Up to ~80.000 events/sec

• Output : Produce ~40.000 KPI/window

21

Page 22: A brief history of time with Apache Flink.pdf

GNOC

The results

Dataviz

23

Page 23: A brief history of time with Apache Flink.pdf

Benefits

24

• Monitor and improve customer

experience

• Reduced incident detection time

• Help GNOC alarm prioritization

based on customer experience

• Reduced operating costs

Page 24: A brief history of time with Apache Flink.pdf

Difficulties

• Massive amounts of data in both input and output

• Savepoint/Checkpoint cost

• HBase analytic limitations

–Read vs Write

–Long Scan

• Massive out of order events

25

Page 25: A brief history of time with Apache Flink.pdf

What’s Next

26

• Flink + Kudu

• Async/Incremental checkpoints

• Flink CEP

• Streaming SQL

• Flink applications monitoring &

industrialization.

Page 26: A brief history of time with Apache Flink.pdf

27

Questions ?