WUNCA-33
Logging and Big Data
[email protected]@natawutnhttp://natawutn.wordpress.comhttp://www.slideshare.net/natawutnupairoj
..
IT LOG SERVER ?
.
LOG
Real-Time
Server
2-3
IT LOG
Users 40,000+Servers = 500+Wifi + NAT
Manual processes
Approx. traffic: 5,000 events / sec
Storage 90 days= 39,000,000,000 events (6.5TB)
Use Syslog + Graylog-2 (based on
ElasticSearch technology)
Log
Real-Time
ARCHITECTURAL PATTERN #1BASIC ARCHITECTURE
ARCHITECTURAL PATTERN #2LAMBDA ARCHITECTURE
Speed Layer
Batch Layer Serving Layer
Python
Opensource software framework Google Search Engine Architecture
Commodity Hardware
Map-Reduced Cluster Parallel Processing
Hadoop File System (HDFS) reliable
: Yahoo!, Facebook, Amazon, eBay, American Airline, Apple, Google, HP, IBM, Microsoft, Netflix, New York Times,
(In-Memory Data Processing) UC Berkeley
MapReduce batch executions, interactive queries, stream processing
Java, Python, Scala, R analytic libraries (machine learning, graph processing)
Hadoop 10-100
ELASTICSEARCH
OpenSource Search Engine
Real-Time data
Scale-Out Cluster
Shard
Shard Timestamp Log
1 Shard Copy (Replication)
# of shards = 1# of replicas = 1
# of shards = 2# of replicas = 1
# of shards = 3# of replicas = 1
# of shards = 3# of replicas = 2
DATA COLLECTOR (LOG SHIPPER)
Server Big Data Real-Time
/ In-Flight Data
Adapter / Plugin Architecture
Reliability Availability
Source: Sematext, Top 5 Most Popular Log Shipper,
http://blog.sematext.com/2014/10/06/top-5-most-popular-log-shippers/
APACHE FLUME
Opensource
Distributed / Reliable / Scalable
Event
APACHE FLUME OVERVIEW BY GETINDATA
LOG LAMBDA ARCHITECTURE
/
Traffic Anomaly ( SARIMA)
Top Related