BüroWAREERP Auktionssteuerung für eBay - Für eBay-Powerseller und Multi-Level-Vertrieb
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
-
Upload
qingwen-zhao -
Category
Engineering
-
view
161 -
download
4
Transcript of Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache• 来自 eBay 的分布式实时 Hadoop 数据安全引擎
蒋吉麟 | 赵晴雯eBay
2
Agenda•About Eagle•Front End
– Evolution– Modularization– Features
•Back End– Architecture– Tech Highlights– Integration
•Q & A
3
Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop from eBayOpen sourced as Apache Incubator Project on Oct 26th 2015
See http://eagle.incubator.apache.org or http://goeagle.io
4
Hadoop @eBay
1-10 nodes
2007
100+ nodes1000 + core
1 PB2010
20111000+ node10,000+ core10+ PB
4000+ node40,000+ core
50+ PB2013
201510,000+ nodes150,000+ cores150+ PB
200910+ nodes
5
•swf•exe
6
Features•common•metadata•classification•metrics
7
common•Policies•Alerts
8
metadata
9
classification•Tree View•Table View
10
metrics
11
ArchitectureSTREAM PROCESSING
ENGINEUser Profile
based Anomaly detection
Policy evaluation based
Framework
Eagle Storage(Metadata,
metrics, alerts…
User Profile training
Eagle Query
Dat
a Co
llect
ion(
Kaf
ka, Y
arn
API
…)
Hadoop jmx
Dat
a Si
nk(e
mai
l, K
akfa
…)
Other Remediation
Systems…
12
Tech Highlights•Data Collection•Stream Processing DSL•Distributed Policy Engine•ML-based anomaly detection•Query Framework
NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us
13
Apache Eagle – Data CollectionDecoupled with Apache Kafka• high-throughput distributed messaging• Easy to inject various kinds of data sources
• Python/Java/C++ Kafka clients
Current data sources support• Hadoop data
HDFS, HBase audit log GC logs JMX metrics History/Running MR job data
• …• Generic format data
14
Apache Eagle – Stream Processing DSLEasy use
– Easily assemble data transformation, filtering, join…Flexibility
– Physical execution platform independent
STREAM PROCESSING ENGINESTREAM PROCESSING ENGINE
.flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute()
15
Apache Eagle – Stream Processing DSL
.flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute();
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Alerts
Real-time Event Stream
Stream_{1}
Stream_{*}
Stream Processing
env.execute()
16
Apache Eagle - Distributed Real-time Policy EngineFeatures
• Extensibility• Usability• Real-time• Scalability• Metadata-driven
METADATA MANAGER
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time Alerts
Alerts
Policy Management
Policy
Dynamical Policy Deployment
Real-time Event Stream
Stream_{1}
Stream_{*}
Dynamical Stream Schema
Stream Processing
17
Apache Eagle – Distributed Real-time Policy Engine
Distributed Real-time Policy Engine
Siddhi CEP Policy
Evaluator
Machine Learning Policy
Evaluator
Extensibility
• Default is WSO2 Siddhi CEP• Powerful SQL-Like event stream
processing• Open to other customized policy engine
Extensible Policy Evaluator
public interface PolicyEvaluatorServiceProvider {public String getPolicyType(); // literal string to identify one type of policypublic Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy
evaluator implementationpublic List getBindingModules(); // policy text with json format to object mapping
}
public interface PolicyEvaluator {public void evaluate(ValuesArray input) throws Exception; // evaluate
input eventpublic void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);//
policy updatepublic void onPolicyDelete(); // invoked when policy is deleted
}
METADATA MANAGER
Policy/Metadata
18
Apache Eagle – Distributed Real-time Policy Engine
METADATA MANAGER
Distributed Streaming Cluster Environment
Real Time Alerts
Alerts
Policy Management
Policy
Dynamical Policy Deployment
Usability• Powerful SQL-Like CEP CQL
for Policy Definition• Dynamical Policy Lifecycle
Management (Deployment/Update)
• Easy-to-use Policy management and Alert analytics UI
from metricStream[(name == 'ReplLag') and (value > 1000)] select * insert into
outputStream;
19
Apache Eagle – Distributed Real-time Policy Engine
20
Apache Eagle – Distributed Real-time Policy Engine
Real-time• Stream events are
processed and alerts are evaluated during streaming
Distributed Streaming
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time Alerts
AlertsStream_{1}
Stream_{*}
Stream Processing
Real-time Event Stream
21
Apache Eagle – Distributed Real-time Policy Engine
Metadata-Driven
• Stream Schema: AlertStreamSchemaEntity
• Policy Definition: AlertDefinitionAPIEntity
@Table("alertdef")@ColumnFamily("f")@Prefix("alertdef")@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)@JsonIgnoreProperties(ignoreUnknown = true)@TimeSeries(false)@Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"})@Indexes({ @Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true),})public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{ @Column("a") private String desc; @Column("b") private String policyDef; @Column("c") private String dedupeDef;
METADATA MANAGER
Distributed Real-time Policy Engine
Dynamic Metadata Loading
22
Apache Eagle – Distributed Real-time Policy Engine
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream Processing
Scalability• Policy scalability: policy partitioning• Event scalability: grouping• Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
23
Apache Eagle – Query FrameworkQuery Syntax• Full-function SQL-Like REST
Query (aggregation, sorting…)
Eagle Storage• NOSQL storage like HBase• RDMS• Other storage systems
24
Apache Eagle – ML-based Anomaly DetectionUser Activity Anomaly Detection• User profile feature
selection• Offline user profile
generation• Online Anomaly
detection
Useful link• Eagle:
User profile-based anomaly detection for securing Hadoop clusters
25
Apache Eagle – Integration I• Eagle in Apache Ambari
– natively be part of hadoop ecosystem– http://eagle.incubator.apache.org/docs/ambari-plugin-install.html
• Eagle in Docker– natively fly on Cloud/Container – https://github.com/apache/incubator-eagle
26
Apache Eagle – Integration II•Apache Ranger
– remediation engine– Eagle data source
•Splunk– Eagle alert consumer – EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
• Dataguise, Apache knox– Eagle data source
27
Learn more about Apache Eagle• EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER
(IEEE)• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP
CLUSTER
28
Q&A
apache/incubator-eagle
@TheApacheEagle
@ApacheEagle
http://eagle.incubator.apache.org