A brief history of time with Apache Flink.pdf

Post on 13-Feb-2017

244 views 0 download

Transcript of A brief history of time with Apache Flink.pdf

A Brief History of Time with

Apache Flink

Real time monitoring and analysis with Flink,

Kafka and HBase

Flink Forward 2016Berlin, September, 12th, 2016

1

Thomas LAMIRAULT

Mohamed Amine ABDESSEMED

The speakers

• Software engineer & solution

architect @ Bouygues

Telecom since 2013

• A Flinker since the early

beginnings

@AminouvicTweets

2

Mohamed Amine

ABDESSEMED

• Bigdata Software engineer

@ Bouygues Telecom

since 2015

• A Flink master enthusiast

@thomaslamirault

Thomas

LAMIRAULT

Outline

• Who is Bouygues Telecom

• Data Value and Streaming Analytics

• Use case

• Challenges

• Streaming analytics with Flink

• Results

3

15M Clients

12,1M Mobile subscriber

2,9M Fixed customer

WHO IS BOUYGUES TELECOM ?

Mobile . Fixed . TV . Internet . Cloud

4

First Android TV BOX

Leader

4G/4G+

VoLTE

UHSM

A very Innovative company

WHO IS BOUYGUES TELECOM ?

5

LUX: Logged User eXperienceMobile QoE

• Our Big Data platform

• Produce Mobile QoE indicators based on massive network equipment’s event logs (Billions event/day).

• Goals:– QoE (User) instead of QoS (Machine)

– Real-time Diagnostic (<60’ end-to-end latency)

– Business Intelligence

– Reporting

– Real-time alarming

6

LUXIn Numbers

7

~300 users

30 Flink

productionapps

10Billions

rawevent/day

2 Po storage

~120 users

5 Flink production

apps

4 Billions raw

event/day

750 To storage

2015 Today

Analytic Data value

8Important event

occurrence

Predictiveanalytics

Advanced analytics

Time

Data Value Streaming

analytics

T0 T1 T2 TyT-xBefore important event

NOWAfter important event

DATA IS

MOST

VALUABLE

NOW !

9

Analytic Data value

• Data is most valuable when made

available as soon as important events

occur.

• Get the most of Data

– Collect data fast.

– (Pre)Process it fast.

– Analyze it and create added value to act

faster!

10

The use case

11

GNOC

Identify

Define

ExploreAction

Look backSpecific

Numbers

Emergency Numbers

Call Center

Numbers

Something wrong?

The BIG picture isn’t always significant

Global counts have no sense !

The use case

• A simple and valuable use case

• Need to analyze the entire call traffic :

– Considering multiple aggregation axes

– Fine grained analysis

– Detect when something is happening

somewhere in real-time

– Compare with historical values trends

12

Challenges

• Low latency & streaming fashion counters

• Quickly available KPIs = value

• Massive amounts of data + peak loads

• Reliability

• Multiple flow correlation

• Time management:

– Out of order & late events our worst enemies

– Flexible window management

– Specific watermark emission

13

Streaming Analytics Time

Management

14

8h00 8h30 9h00 9h30Processing Time

Streaming Analytics Time

Management

15

8h00 8h30 9h00 9h30Processing Time

8h09

8h14

8h03

8h00 8h12

8h43

8h45

8h48

9h10

8h00

9h12

8h13

8h47

9h32

9h35

9h14

9h30

8h02

8h03

Event TimeWindow

FIRE

Streaming Analytics with Flink

Built-in windowing functionalities

• Custom Watermark extractor

• Custom Triggers for lateness management

• Custom Key extractor

Stateful Streaming

• Checkpointing

• Fault tolerance

• Savepoints

• Update without data loss

16

Streaming analytics with Flink

Performance

• High throughput

• Low latency

• Excellent memory management

Flexible window management

• Tumbling

• Sliding

• Session

17

High Level Architecture

18

Queue Streaming

ProcessingWindowing

Computation

Collect

Storage

Dataviz

In-Memory Lookups

Alarming

QueueQueue

StreamingStreaming

Architectural details

19

Storage

Kaf

ka c

lust

er

T’1

P1

Px

..

T’n

P1

Px

..

T1

P1

Px

..

Tn

P1

Px

..

Tm

P1

Px

..

T’m

P1

Px

..

Analytic APP

State Backend

Historical Data

Monitoring

Alarming

Streaming application Details

20

TnTnTn Multiple Input Topics

filterfilterfilter Keep only records that interest us

Extract timestampExtract relevant event timestamp, with custom watermark emission

keyByGroup by considered aggregation axes

window Tumbling windows

triggerCustom trigger, allow configurable late data and manage lagged data

sink Write Data into HBase

reduce Produce KPIs

UnionUnion Correlate input flows

The resultsProduction metrics

• Low latency (<100ms)

• Input : Up to ~80.000 events/sec

• Output : Produce ~40.000 KPI/window

21

GNOC

The results

Dataviz

23

Benefits

24

• Monitor and improve customer

experience

• Reduced incident detection time

• Help GNOC alarm prioritization

based on customer experience

• Reduced operating costs

Difficulties

• Massive amounts of data in both input and output

• Savepoint/Checkpoint cost

• HBase analytic limitations

–Read vs Write

–Long Scan

• Massive out of order events

25

What’s Next

26

• Flink + Kudu

• Async/Incremental checkpoints

• Flink CEP

• Streaming SQL

• Flink applications monitoring &

industrialization.

27

Questions ?