Retrospection / prospection and schema

Post on 08-May-2015

2.889 views 0 download

description

筑波大学 集中講義資料 2014/01/31

Transcript of Retrospection / prospection and schema

Retrospection / prospectionand schema

TAGOMORI Satoshi (@tagomoris)LINE Corp.

2014/01/31 (Fri) at University of Tsukubathe 1st half

14年1月31日金曜日

TAGOMORI Satoshi (@tagomoris)LINE Corp.

Development Support Team

14年1月31日金曜日

14年1月31日金曜日

14年1月31日金曜日

Logs

Service metrics (Users, PageViews, ...)

UX/UI metrics (Access path, Taps/views, ...)

Monitoring metrics (Traffic Gbps, TBytes/day, ...)

System monitoring (Error rates, Response time, ...)

14年1月31日金曜日

Software for Logging

Collection: Fluentd, Scribed, Flume, LogStash, ...

Storage: RDBMS, Hadoop HDFS, NoSQLs, Elasticsearch, ....

Processing: SQL, Hadoop MapReduce(Hive), Presto, Impala, ... Stream-Processing: Storm, Kafka, Norikra, ...

Visualization: Kibana, Tableau Fnordmetric, GrowthForecast, Focuslight, ...

Appliance: DHW + BI Tools

Services: Google BigQuery, Treasure Data, ...

14年1月31日金曜日

How inspect logs

Retrospection (reactive search)

Store data, and search

Prospection (proactive search)

Define what should be processed, and store data

14年1月31日金曜日

What logs inspected

Schema-full data:

strict schema: pre defined fields w/ types (or reject)

schema on read: try to read known fields (or ignore)

Schema-less data:

any fields (or ignore), any types (implicit/explicit conversion)

fit for services in-development (all internet services!)

14年1月31日金曜日

How/what

How\What Schema-full Schema-less

RetrospectRDBMS,

Hive, BigQuery,Cassandra, HBase, ...

MongoDB,Hive(SerDe), TD,Plain text file, ...

ProspectEsper,

many of stream CEPs,...

Norikra, ...

14年1月31日金曜日

Data size: schema & indexLogs: size is always important (xTB - xPB)Schema:

size optimizationaccess optimization on memory/disk

Index:access optimization on memory/diskmore memory/disk requiredhard to distribute

14年1月31日金曜日

Query response improvementsof retrospection

Schema-full + indexed (RDBMS)

Query plan optimization

Schema on read

I/O and Task size optimization & scale out

Schema-less + indexed (Mongo)

mmap-ed index & data (!)

14年1月31日金曜日

Query response improvementsof prospection

Time window + incremental calculation

Stream processing engines

14年1月31日金曜日

Stream processingand data size

No disks: reduction of failure points

Less memory:

size of just processing and I/O buffers

aggregation results

Easy to distribute:

stream duplication

stream splitting by aggregation key

14年1月31日金曜日

Stream processing and schema

Stream processing: query -> data

Prospective schema by queries:

Queries know required fields and its types

Unused fields can be ignored

Implicit type conversion available

Schema-less data + schema-full queries

14年1月31日金曜日

My goal:Schema-less data stream + schema-full queries

It’s Norikra!

14年1月31日金曜日