Fluentd: Data streams in Ruby world #rdrc2014

Post on 08-Sep-2014

23 views 8 download

Tags:

description

 

Transcript of Fluentd: Data streams in Ruby world #rdrc2014

Fluentd:Data streams in Ruby world

@tagomorisRedDotRubyConf 2014

Day1, 26 June 2014

14年6月26日木曜日

TAGOMORI Satoshi a.k.a. @tagomoris

14年6月26日木曜日

14年6月26日木曜日

14年6月26日木曜日

14年6月26日木曜日

Fluentd

Fluentd is an open source data collector to simplify log management.Fluentd is designed to process high-volume data streams reliably. Use cases include real-time search and monitoring, Big Data analytics, reliable archiving and more.

http://www.fluentd.org/

14年6月26日木曜日

Before Fluentd:Access logs Metrics

Archives

apachenginx

graphs

Amazon S3Filesystem

tail -f

scppython

Error handling? Buffering?

14年6月26日木曜日

Before Fluentd:Access logs Metrics

Analytics

Archives

apachenginx

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

Error handling? Buffering?Routing? API Keys?

14年6月26日木曜日

Before Fluentd:Access logs

App logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

Before Fluentd:Access logs

App logs

System logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

Before Fluentd: CHAOSAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

filelogger ruby

cmd

ruby

Error handling? Buffering?Routing? API Keys? Formats?

14年6月26日木曜日

After Fluentd: ControllableAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

14年6月26日木曜日

Access logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

Fluentd does:Format, Buffer, Retry, Route

After Fluentd: Controllable

14年6月26日木曜日

FluentdOpen source data collector

Written in Ruby, runs on CRuby on UNIX-like OSWith error handling and routing in core

Plugin systemsInput, Output and Buffer (w/ many built-in plugins)

Distributed on rubygems.orgFluentd and its plugins: gem install fluentdrpm/deb are also available (td-agent)

14年6月26日木曜日

Why Fluentd?

14年6月26日木曜日

Why Fluentd?Fluentd’s logo is very cute!

14年6月26日木曜日

He is also very cute...

14年6月26日木曜日

Why Fluentd?Simple data structure

tag, time and record(hash)

Apache-like configuration syntax

Simple / powerful routing

Many public plugins

Just few steps for custom plugins

Scalability

14年6月26日木曜日

Fluentd Event

app.device.ios2014-06-24 16:28:50{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

Event

14年6月26日木曜日

Fluentd Event

app.device.ios1403512916 (2014-06-23 16:41:56 +0800)

{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

tagtime

record

14年6月26日木曜日

Fluentd Event

app.device.ios1403512916 (2014-06-23 16:41:56 +0800)

{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}

tag for routing

record

structured data

time by unix time

14年6月26日木曜日

# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match app.**> type copy

<store> type mongo host mongo.example.com capped capped_size 200m </store>

<store> type s3 path archive/ </store></match>

Fluentd Configuration14年6月26日木曜日

# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match app.**> type copy

<store> type mongo host mongo.example.com capped capped_size 200m </store>

<store> type s3 path archive/ </store></match>

Fluentd Configuration

for input for output

14年6月26日木曜日

# read from a file and parsesource { type ”tail” path “/var/log/httpd.log” format “apache2” tag ”web.access”}

# logs from client librariessource { type ”forward” port 24224}

# store logs to MongoDB and S3match(“app.**”) { type ”copy”

store { type ”mongo” host “mongo.example.com” capped capped_size “200m” }

store { type ”s3” path “archive/” }}

Fluentd Configuration DSL14年6月26日木曜日

Tag based routing

input

input

output

output

input

output

output

coretagtime

record

web.log

sys.*

app.**

**

14年6月26日木曜日

Tag based routing

input

input

output

output

input

output

output

coretagtime

record

web.log

sys.*

app.**

**

converted.web.log

14年6月26日木曜日

300+ Public Plugins

access, add, aes-forward, airbrake-python, amazon_sns, amplifier-filter, amqp, amqp2, andon, anomalydetect, anonymizer, arango, arduino, axlsx, backlog, bigquery, boundio, buffer-

lightening, buffered-filter, buffered-hipchat, buffered-stdout, bufferize, calc, cassandra, cassandra-cql, cloudstack, cloudwatch, cloudwatch_ya, combiner, conditional_filter, config-expander, config_pit, config_reloader, convert-value-to-sha, copy_ex, couch, couch-sharded,

couchbase, dashing, data-rejecter, datacalculator, datacounter, dbi, dd, debug, delay-inspector, delayed, derive, df, droonga, dstat, dummydata-producer, dynamodb, ec2-metadata, elapsed-time, elasticsearch, elasticsearch-cluster, elasticsearch-ruby, elb-log, embedded-

elasticsearch, eval-filter, event-tail, extract_query_params, file-alternative, file-sprintf, filter, filter_keys, flatten, flatten-hash, flowcounter, flowcounter-simple, flume,

fnordmetric, forest, fork, format, forward-aws, ftp, gamobile, ganglia, gc, geoip, glusterfs, graphite, grassland, gree_community, grep, grepcounter, groonga, groupcounter, growl,

growthforecast, gstore, hash-forward, hato, hbase, hekk_redshift, heroku-postgres, heroku-syslog, hipchat, histogram, hoop, hostname, hrforecast, http-enhanced, http-ex, http-list, http-status, https-json, idobata, ikachan, imagefile, imkayac, in-udp-event, incremental,

influxdb, influxdb_metrics, inline-classifier, irc, jabber, json-api, json-nest2flat, jsonbucket, jstat, jubatus, jvmwatcher, kafka, kanicounter, keep-forward, kestrel, kibana-

server, kinesis-alt, latency, leftronic, librato-metrics, loggly, lossycount, mackerel, mail, map, measure_time, mecab, metricsense, mixi_community, mixpanel, mobile-carrier, mongo,

mongo-typed, mongokpi, mqtt, msgpack-rpc, mssql, multiprocess, munin, mysql, mysql-binlog, mysql-bulk, mysql-load, mysql-prepared-statement, mysql-query, mysql-replicator,

mysqlslowquery, mysqlslowquerylog, nats, network-probe, nginx-status, nicorepo, norikra, notifier, numeric-counter, numeric-monitor, onlineuser, openldap-monitor, opentsdb, order,

out-http, out-http-buffered, out-solr, parser, pgdist, pghstore, pgjson, ping-message, postgres, qqwry, rambler, rawexec, rds-log, rds-slowlog, reassemble, record

http://www.fluentd.org/plugins

14年6月26日木曜日

Fluentd patterns

14年6月26日木曜日

1.read logs from fileand write these on storages

file in_tailread, parse

out_fileformat, write

file

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse insert

MongoDBout_mongo

https://github.com/fluent/fluent-plugin-mongo

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_mysqlinsert

MySQL

https://github.com/tagomoris/fluent-plugin-mysql

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_elasticsearch

sendElasticsearch

https://github.com/uken/fluent-plugin-elasticsearch

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_webhdfsformat, write

Hadoop HDFS

https://github.com/fluent/fluent-plugin-webhdfs

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_s3format, write

Amazon S3

https://github.com/fluent/fluent-plugin-s3

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_redshiftinsert

Amazon Redshift

https://github.com/hapyrus/fluent-plugin-redshift

in_tail

14年6月26日木曜日

1.read logs from fileand write these on storages

fileread, parse

out_bigqueryinsert

Google BigQuery

https://github.com/tagomoris/fluent-plugin-bigquery

in_tail

14年6月26日木曜日

2.receive and forward datafrom/to other node

forward

forward

forward

inputevents

inputevents

outputevents

fluent-logger-ruby

fluent-logger-java

...

send events over TCP

14年6月26日木曜日

2.receive and forward datafrom/to other node

forward

forward

forward

load balance, active-standby forward

forward

forward

14年6月26日木曜日

datacenter

2’.receive and forward datafrom/to other node, over internet & SSL

secure-forwardsecure-forward

datacenter

secure-forward

send events over SSLwith authentication

https://github.com/tagomoris/fluent-plugin-secure-forward

14年6月26日木曜日

3.connect with other middleware

in_syslog

syslog

Flume

Scribe

Kafka

in_flume

in_scribe

in_kafka

out_flume

in_scribe

in_kafka

Flume

Scribe

Kafka

https://github.com/fluent/fluent-plugin-flumehttps://github.com/fluent/fluent-plugin-scribehttps://github.com/htgc/fluent-plugin-kafka/

14年6月26日木曜日

4.copy events

forward copy

forward

webhdfs Hadoop HDFS

14年6月26日木曜日

5.count events by string values

forward any outputs

count recordsby regexp patterns

events

{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}

datacounter

https://github.com/tagomoris/fluent-plugin-datacounter

14年6月26日木曜日

5.count events by numeric values

forward numeric-counter any outputs

count recordsby numerical range

https://github.com/tagomoris/fluent-plugin-numeric-counter

events

{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}

14年6月26日木曜日

5.aggregate numeric values

forward numeric-monitor any outputs

calculate real-time metricsof numeric values

{ “max”: 128, “min”: 16, “avg”: 64.0, “sum”: 1024, “num”: 20, “percentile_50”: 48, “percentile_90”: 112, ...}

https://github.com/tagomoris/fluent-plugin-numeric-monitor

events

14年6月26日木曜日

6.various inputs: Linux performance (dstat)

in_dstatdstat

collect server performance data

https://github.com/shun0102/fluent-plugin-dstat14年6月26日木曜日

6.various inputs: SQL execution

in_sql

input from SELECT

RDBMS

https://github.com/fluent/fluent-plugin-sql

14年6月26日木曜日

6.various inputs: external command

in_execany commands

input from STDOUT of any commands

14年6月26日木曜日

7.various outpus: notification on IRC

out_ikachan

notice on IRC channel

IRC

https://github.com/tagomoris/fluent-plugin-ikachan

14年6月26日木曜日

7.various outpus: notification on IRC

out_ikachan

notice on IRC channel

IRC

https://github.com/tagomoris/fluent-plugin-ikachan

14:56 ikachan: HTTP status_4xx crit [2014-06-23 14:56:29 +0900] serviceX: 100.00 (threshold 75.0) http://graph.tool.local/view_graph/accesslog/httpstatus/serviceX_4xx_percentage14:57 kazeburo: ↑ 40x 100%...

14年6月26日木曜日

7.various outpus: notification on HipChat

out_hipchat

notice on HipChat

HipChat

https://github.com/hotchpotch/fluent-plugin-hipchat

14年6月26日木曜日

7.various outpus: graph tools

out_growthforecast

POST data into graph tools

GrowthForecastor

Focuslight

https://github.com/tagomoris/fluent-plugin-growthforecast

14年6月26日木曜日

7.various outpus

out_growthforecast

POST data into graph tools

GrowthForecastor

Focuslight

https://github.com/tagomoris/fluent-plugin-growthforecast

14年6月26日木曜日

7.various outpus: external command

out_exec any commands

output into STDIN of any commands

14年6月26日木曜日

8. filters:stream processing: external command

any inputs any outputs

format & writeinto STDIN

exec_filter

any commands

read & parsefrom STDOUT

read from STDINdo WHATEVER you want

write into STDOUT

ex: tail -f | grep ... | sed ... | cat

events

14年6月26日木曜日

8. filters: stream processing w/ external server RPC

any inputs any outputs

send

out_norikra

fetch

stream processing w/ SQL

in_norikra

http://norikra.github.io/

SELECT stage, score, COUNT(*) AS cFROM results.win:time_batch(1 min)WHERE stage > 1 AND user.validGROUP BY stage, score

events

14年6月26日木曜日

... And,Fluentd does

error handling and retriesfor all of these plugins!

14年6月26日木曜日

Before Fluentd: CHAOSAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

tail -f

scppython

ruby

rubycmd

file rubylogger

filelogger ruby

cmd

ruby

14年6月26日木曜日

After Fluentd: ControllableAccess logs

App logs

System logs

Various logs

Metrics

Analytics

Archives

apachenginx

frontendbackend

syslogdsnmp data

graphs

HadoopMySQLMongoDBRedshift

Amazon S3Filesystem

14年6月26日木曜日

Fluentd: Now and then

14年6月26日木曜日

Fluentd versions

Latest: v0.10.50

released on Jun 17, 2014

v0.10.x: Stable versions

many minor feature updates, bug fixes

new features for v1

14年6月26日木曜日

Fluentd v1Planned as the first major release

someday in 2014 (?)

100% Compatible with v0.10.x

New (and additional) features on v1.x loadmap

https://github.com/fluent/fluentd/issues/251

new configuration syntax, plugin backends

daemon process management

multi core CPU supports

14年6月26日木曜日

Fluentd on JRuby

Under development!

trying to fix Cool.io to support JRuby

14年6月26日木曜日

Fluentd on Windows

Under development!

“windows” branch on github fluent/fluend

14年6月26日木曜日

Use case in LINE

14年6月26日木曜日

Analytics data flow overview

servers FluentdCluster

archive

visualization

notifications

Hadoop

Fluentd

Norikra

applicationmetrics

14年6月26日木曜日

servers FluentdCluster

archive

visualization

notifications

Hadoop

Fluentd

Norikra

applicationmetrics

delivery/stream-map

aggregate/stream-reduce

14年6月26日木曜日

archive

visualization

notifications

Hadoop

Norikra

applicationmetrics

fluent-agent-lite

non-parsed raw logsnon-parsedaccess logs

deliver: receive/archive/load-balance

worker:parse/store/forward

watcher: monitor/notify

cep:general-purpose

stream processing

14年6月26日木曜日

Fluentd cluster statistics

Fluentd nodesaccess/application logs from 600+ nodesreceiver: 5 server (60 process)parser/converter: 10 server (90 process)stream processing: 3 server

14年6月26日木曜日

Fluentd cluster statistics

Daily:5.5+ Billion events, 1.5TB+ data

Peak time:150,000+ events /sec, 300+ Mbps

14年6月26日木曜日

Fluentd is the best partnerfor stream-processing newbiesand rubyists!

Check out sites and code!http://fluentd.org/

https://github.com/fluent/fluentd

14年6月26日木曜日

FAQ

14年6月26日木曜日

Fault-tolerance?

Node level fault-tolerance

File buffer: processing data can be serialized on disk

Cluster level fault-tolerance

Copy + Forward(load balance, active-standby)

Event level assurance: ACK?

NO (for performance reason)

14年6月26日木曜日

Performance?

NOT SO BAD:

real throughput depends on plugin/configuration

simple event transferring: 10-20k events/sec

14年6月26日木曜日

vs Scribe? vs Flume?

14年6月26日木曜日

vs Storm?

14年6月26日木曜日

Eco-system? Clones?

ik

fluent-agent-lite

fluenpy

14年6月26日木曜日