Fluentd:Data streams in Ruby world
@tagomorisRedDotRubyConf 2014
Day1, 26 June 2014
14年6月26日木曜日
TAGOMORI Satoshi a.k.a. @tagomoris
14年6月26日木曜日
14年6月26日木曜日
14年6月26日木曜日
14年6月26日木曜日
Fluentd
Fluentd is an open source data collector to simplify log management.Fluentd is designed to process high-volume data streams reliably. Use cases include real-time search and monitoring, Big Data analytics, reliable archiving and more.
http://www.fluentd.org/
14年6月26日木曜日
Before Fluentd:Access logs Metrics
Archives
apachenginx
graphs
Amazon S3Filesystem
tail -f
scppython
Error handling? Buffering?
14年6月26日木曜日
Before Fluentd:Access logs Metrics
Analytics
Archives
apachenginx
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
tail -f
scppython
ruby
rubycmd
Error handling? Buffering?Routing? API Keys?
14年6月26日木曜日
Before Fluentd:Access logs
App logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
tail -f
scppython
ruby
rubycmd
file rubylogger
Error handling? Buffering?Routing? API Keys? Formats?
14年6月26日木曜日
Before Fluentd:Access logs
App logs
System logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
tail -f
scppython
ruby
rubycmd
file rubylogger
Error handling? Buffering?Routing? API Keys? Formats?
14年6月26日木曜日
Before Fluentd: CHAOSAccess logs
App logs
System logs
Various logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
tail -f
scppython
ruby
rubycmd
file rubylogger
filelogger ruby
cmd
ruby
Error handling? Buffering?Routing? API Keys? Formats?
14年6月26日木曜日
After Fluentd: ControllableAccess logs
App logs
System logs
Various logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
14年6月26日木曜日
Access logs
App logs
System logs
Various logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
Fluentd does:Format, Buffer, Retry, Route
After Fluentd: Controllable
14年6月26日木曜日
FluentdOpen source data collector
Written in Ruby, runs on CRuby on UNIX-like OSWith error handling and routing in core
Plugin systemsInput, Output and Buffer (w/ many built-in plugins)
Distributed on rubygems.orgFluentd and its plugins: gem install fluentdrpm/deb are also available (td-agent)
14年6月26日木曜日
Why Fluentd?
14年6月26日木曜日
Why Fluentd?Fluentd’s logo is very cute!
14年6月26日木曜日
He is also very cute...
14年6月26日木曜日
Why Fluentd?Simple data structure
tag, time and record(hash)
Apache-like configuration syntax
Simple / powerful routing
Many public plugins
Just few steps for custom plugins
Scalability
14年6月26日木曜日
Fluentd Event
app.device.ios2014-06-24 16:28:50{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}
Event
14年6月26日木曜日
Fluentd Event
app.device.ios1403512916 (2014-06-23 16:41:56 +0800)
{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}
tagtime
record
14年6月26日木曜日
Fluentd Event
app.device.ios1403512916 (2014-06-23 16:41:56 +0800)
{ “username”: “tagomoris”, “fullname”: “TAGOMORI Satoshi”, “age”: 34, “device”: “iPhone 5”, ...}
tag for routing
record
structured data
time by unix time
14年6月26日木曜日
# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>
# logs from client libraries<source> type forward port 24224</source>
# store logs to MongoDB and S3<match app.**> type copy
<store> type mongo host mongo.example.com capped capped_size 200m </store>
<store> type s3 path archive/ </store></match>
Fluentd Configuration14年6月26日木曜日
# read from a file and parse<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>
# logs from client libraries<source> type forward port 24224</source>
# store logs to MongoDB and S3<match app.**> type copy
<store> type mongo host mongo.example.com capped capped_size 200m </store>
<store> type s3 path archive/ </store></match>
Fluentd Configuration
for input for output
14年6月26日木曜日
# read from a file and parsesource { type ”tail” path “/var/log/httpd.log” format “apache2” tag ”web.access”}
# logs from client librariessource { type ”forward” port 24224}
# store logs to MongoDB and S3match(“app.**”) { type ”copy”
store { type ”mongo” host “mongo.example.com” capped capped_size “200m” }
store { type ”s3” path “archive/” }}
Fluentd Configuration DSL14年6月26日木曜日
Tag based routing
input
input
output
output
input
output
output
coretagtime
record
web.log
sys.*
app.**
**
14年6月26日木曜日
Tag based routing
input
input
output
output
input
output
output
coretagtime
record
web.log
sys.*
app.**
**
converted.web.log
14年6月26日木曜日
300+ Public Plugins
access, add, aes-forward, airbrake-python, amazon_sns, amplifier-filter, amqp, amqp2, andon, anomalydetect, anonymizer, arango, arduino, axlsx, backlog, bigquery, boundio, buffer-
lightening, buffered-filter, buffered-hipchat, buffered-stdout, bufferize, calc, cassandra, cassandra-cql, cloudstack, cloudwatch, cloudwatch_ya, combiner, conditional_filter, config-expander, config_pit, config_reloader, convert-value-to-sha, copy_ex, couch, couch-sharded,
couchbase, dashing, data-rejecter, datacalculator, datacounter, dbi, dd, debug, delay-inspector, delayed, derive, df, droonga, dstat, dummydata-producer, dynamodb, ec2-metadata, elapsed-time, elasticsearch, elasticsearch-cluster, elasticsearch-ruby, elb-log, embedded-
elasticsearch, eval-filter, event-tail, extract_query_params, file-alternative, file-sprintf, filter, filter_keys, flatten, flatten-hash, flowcounter, flowcounter-simple, flume,
fnordmetric, forest, fork, format, forward-aws, ftp, gamobile, ganglia, gc, geoip, glusterfs, graphite, grassland, gree_community, grep, grepcounter, groonga, groupcounter, growl,
growthforecast, gstore, hash-forward, hato, hbase, hekk_redshift, heroku-postgres, heroku-syslog, hipchat, histogram, hoop, hostname, hrforecast, http-enhanced, http-ex, http-list, http-status, https-json, idobata, ikachan, imagefile, imkayac, in-udp-event, incremental,
influxdb, influxdb_metrics, inline-classifier, irc, jabber, json-api, json-nest2flat, jsonbucket, jstat, jubatus, jvmwatcher, kafka, kanicounter, keep-forward, kestrel, kibana-
server, kinesis-alt, latency, leftronic, librato-metrics, loggly, lossycount, mackerel, mail, map, measure_time, mecab, metricsense, mixi_community, mixpanel, mobile-carrier, mongo,
mongo-typed, mongokpi, mqtt, msgpack-rpc, mssql, multiprocess, munin, mysql, mysql-binlog, mysql-bulk, mysql-load, mysql-prepared-statement, mysql-query, mysql-replicator,
mysqlslowquery, mysqlslowquerylog, nats, network-probe, nginx-status, nicorepo, norikra, notifier, numeric-counter, numeric-monitor, onlineuser, openldap-monitor, opentsdb, order,
out-http, out-http-buffered, out-solr, parser, pgdist, pghstore, pgjson, ping-message, postgres, qqwry, rambler, rawexec, rds-log, rds-slowlog, reassemble, record
http://www.fluentd.org/plugins
14年6月26日木曜日
Fluentd patterns
14年6月26日木曜日
1.read logs from fileand write these on storages
file in_tailread, parse
out_fileformat, write
file
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse insert
MongoDBout_mongo
https://github.com/fluent/fluent-plugin-mongo
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_mysqlinsert
MySQL
https://github.com/tagomoris/fluent-plugin-mysql
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_elasticsearch
sendElasticsearch
https://github.com/uken/fluent-plugin-elasticsearch
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_webhdfsformat, write
Hadoop HDFS
https://github.com/fluent/fluent-plugin-webhdfs
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_s3format, write
Amazon S3
https://github.com/fluent/fluent-plugin-s3
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_redshiftinsert
Amazon Redshift
https://github.com/hapyrus/fluent-plugin-redshift
in_tail
14年6月26日木曜日
1.read logs from fileand write these on storages
fileread, parse
out_bigqueryinsert
Google BigQuery
https://github.com/tagomoris/fluent-plugin-bigquery
in_tail
14年6月26日木曜日
2.receive and forward datafrom/to other node
forward
forward
forward
inputevents
inputevents
outputevents
fluent-logger-ruby
fluent-logger-java
...
send events over TCP
14年6月26日木曜日
2.receive and forward datafrom/to other node
forward
forward
forward
load balance, active-standby forward
forward
forward
14年6月26日木曜日
datacenter
2’.receive and forward datafrom/to other node, over internet & SSL
secure-forwardsecure-forward
datacenter
secure-forward
send events over SSLwith authentication
https://github.com/tagomoris/fluent-plugin-secure-forward
14年6月26日木曜日
3.connect with other middleware
in_syslog
syslog
Flume
Scribe
Kafka
in_flume
in_scribe
in_kafka
out_flume
in_scribe
in_kafka
Flume
Scribe
Kafka
https://github.com/fluent/fluent-plugin-flumehttps://github.com/fluent/fluent-plugin-scribehttps://github.com/htgc/fluent-plugin-kafka/
14年6月26日木曜日
4.copy events
forward copy
forward
webhdfs Hadoop HDFS
14年6月26日木曜日
5.count events by string values
forward any outputs
count recordsby regexp patterns
events
{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}
datacounter
https://github.com/tagomoris/fluent-plugin-datacounter
14年6月26日木曜日
5.count events by numeric values
forward numeric-counter any outputs
count recordsby numerical range
https://github.com/tagomoris/fluent-plugin-numeric-counter
events
{ “pattern1_count”: 60, “pattern1_rate” : 1.0, “pattern2_count”: 20, “pattern2_rate” : 0.33, ...}
14年6月26日木曜日
5.aggregate numeric values
forward numeric-monitor any outputs
calculate real-time metricsof numeric values
{ “max”: 128, “min”: 16, “avg”: 64.0, “sum”: 1024, “num”: 20, “percentile_50”: 48, “percentile_90”: 112, ...}
https://github.com/tagomoris/fluent-plugin-numeric-monitor
events
14年6月26日木曜日
6.various inputs: Linux performance (dstat)
in_dstatdstat
collect server performance data
https://github.com/shun0102/fluent-plugin-dstat14年6月26日木曜日
6.various inputs: SQL execution
in_sql
input from SELECT
RDBMS
https://github.com/fluent/fluent-plugin-sql
14年6月26日木曜日
6.various inputs: external command
in_execany commands
input from STDOUT of any commands
14年6月26日木曜日
7.various outpus: notification on IRC
out_ikachan
notice on IRC channel
IRC
https://github.com/tagomoris/fluent-plugin-ikachan
14年6月26日木曜日
7.various outpus: notification on IRC
out_ikachan
notice on IRC channel
IRC
https://github.com/tagomoris/fluent-plugin-ikachan
14:56 ikachan: HTTP status_4xx crit [2014-06-23 14:56:29 +0900] serviceX: 100.00 (threshold 75.0) http://graph.tool.local/view_graph/accesslog/httpstatus/serviceX_4xx_percentage14:57 kazeburo: ↑ 40x 100%...
14年6月26日木曜日
7.various outpus: notification on HipChat
out_hipchat
notice on HipChat
HipChat
https://github.com/hotchpotch/fluent-plugin-hipchat
14年6月26日木曜日
7.various outpus: graph tools
out_growthforecast
POST data into graph tools
GrowthForecastor
Focuslight
https://github.com/tagomoris/fluent-plugin-growthforecast
14年6月26日木曜日
7.various outpus
out_growthforecast
POST data into graph tools
GrowthForecastor
Focuslight
https://github.com/tagomoris/fluent-plugin-growthforecast
14年6月26日木曜日
7.various outpus: external command
out_exec any commands
output into STDIN of any commands
14年6月26日木曜日
8. filters:stream processing: external command
any inputs any outputs
format & writeinto STDIN
exec_filter
any commands
read & parsefrom STDOUT
read from STDINdo WHATEVER you want
write into STDOUT
ex: tail -f | grep ... | sed ... | cat
events
14年6月26日木曜日
8. filters: stream processing w/ external server RPC
any inputs any outputs
send
out_norikra
fetch
stream processing w/ SQL
in_norikra
http://norikra.github.io/
SELECT stage, score, COUNT(*) AS cFROM results.win:time_batch(1 min)WHERE stage > 1 AND user.validGROUP BY stage, score
events
14年6月26日木曜日
... And,Fluentd does
error handling and retriesfor all of these plugins!
14年6月26日木曜日
Before Fluentd: CHAOSAccess logs
App logs
System logs
Various logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
tail -f
scppython
ruby
rubycmd
file rubylogger
filelogger ruby
cmd
ruby
14年6月26日木曜日
After Fluentd: ControllableAccess logs
App logs
System logs
Various logs
Metrics
Analytics
Archives
apachenginx
frontendbackend
syslogdsnmp data
graphs
HadoopMySQLMongoDBRedshift
Amazon S3Filesystem
14年6月26日木曜日
Fluentd: Now and then
14年6月26日木曜日
Fluentd versions
Latest: v0.10.50
released on Jun 17, 2014
v0.10.x: Stable versions
many minor feature updates, bug fixes
new features for v1
14年6月26日木曜日
Fluentd v1Planned as the first major release
someday in 2014 (?)
100% Compatible with v0.10.x
New (and additional) features on v1.x loadmap
https://github.com/fluent/fluentd/issues/251
new configuration syntax, plugin backends
daemon process management
multi core CPU supports
14年6月26日木曜日
Fluentd on JRuby
Under development!
trying to fix Cool.io to support JRuby
14年6月26日木曜日
Fluentd on Windows
Under development!
“windows” branch on github fluent/fluend
14年6月26日木曜日
Use case in LINE
14年6月26日木曜日
Analytics data flow overview
servers FluentdCluster
archive
visualization
notifications
Hadoop
Fluentd
Norikra
applicationmetrics
14年6月26日木曜日
servers FluentdCluster
archive
visualization
notifications
Hadoop
Fluentd
Norikra
applicationmetrics
delivery/stream-map
aggregate/stream-reduce
14年6月26日木曜日
archive
visualization
notifications
Hadoop
Norikra
applicationmetrics
fluent-agent-lite
non-parsed raw logsnon-parsedaccess logs
deliver: receive/archive/load-balance
worker:parse/store/forward
watcher: monitor/notify
cep:general-purpose
stream processing
14年6月26日木曜日
Fluentd cluster statistics
Fluentd nodesaccess/application logs from 600+ nodesreceiver: 5 server (60 process)parser/converter: 10 server (90 process)stream processing: 3 server
14年6月26日木曜日
Fluentd cluster statistics
Daily:5.5+ Billion events, 1.5TB+ data
Peak time:150,000+ events /sec, 300+ Mbps
14年6月26日木曜日
Fluentd is the best partnerfor stream-processing newbiesand rubyists!
Check out sites and code!http://fluentd.org/
https://github.com/fluent/fluentd
14年6月26日木曜日
FAQ
14年6月26日木曜日
Fault-tolerance?
Node level fault-tolerance
File buffer: processing data can be serialized on disk
Cluster level fault-tolerance
Copy + Forward(load balance, active-standby)
Event level assurance: ACK?
NO (for performance reason)
14年6月26日木曜日
Performance?
NOT SO BAD:
real throughput depends on plugin/configuration
simple event transferring: 10-20k events/sec
14年6月26日木曜日
vs Scribe? vs Flume?
14年6月26日木曜日
vs Storm?
14年6月26日木曜日
Eco-system? Clones?
ik
fluent-agent-lite
fluenpy
14年6月26日木曜日