RFX - Full-Stack Technology for Real-time Big Data
-
Upload
trieu-nguyen -
Category
Data & Analytics
-
view
700 -
download
0
Transcript of RFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big Data
Key questions1. What is RFX ?2. Why is RFX ?3. How to use RFX ?4. The vision ...
by [email protected] on 27/01/2016
http://engineering.adsplay.net
History
● Applied Lambda Architecture ○ https://en.wikipedia.org/wiki/Lambda_architecture
● In 2012, we used Apache Storm http://storm.apache.org (version 0.7)
● but we want to improve it and made it as full-stack framework
● In 2013, I started RFX with “Reactive philosophy in Mind” for common Big Data problems
● Since 2014 to now, RFX as main tool for our daily real-time big data tasks at FPT
● Core engineers:○ [email protected]○ [email protected]
What is RFX ?
● RFX is “Reactive Function X”● “Function X” is a feature in specific product● “Reactive” means every function can be “feel” and “react” to
optimize UX for user in specific context.
● The framework, is built from open source projects:○ Computing Unit with Akka Actor ( http://akka.io )○ Network Communication with Netty ( http://netty.io )○ Data Processing with Apache { Kafka, Hadoop , Spark }○ Redis ( http://redis.io )○ Front-end with MEAN stack (MongoDB, ExpressJS, AngularJS , NodeJS)
Projects and Products using RFX
1. http://vnexpress.net a. counting article pageviewb. recommendation engine
2. https://eclick.vn a. click analyticsb. impression analytics
3. http://itvad.vn a. Video PlayView Analyticsb. User Behaviour Analyticsc. Heatmap Analyticsd. Device Analyticse. Revenue Ad Optimization
4. …
● Divide code into Micro-Services: ○ Analytical layer ( rfx-stream ) ○ Business logic layer ( rfx-query )○ Machine Learning layer (Apache Spark)○ Database layer (Redis, Mongo, Hadoop)○ Front-end layer (MEAN stack)
● Focus on best practices and reusability ● Foundation for scalability (system and business)● Test-driven development for Real-Time Analytics● Continuous integration & improvement
Why is RFX ?
Core backend modules
rfx-track: ● collecting all events from JavaScript deliveryrfx-stream: ● processing stream data (PipelineProcessing pattern)● processing real-time analytics ● processing business logic (by reactive function)rfx-cronjob: ● synchronizing real-time data to report database (copy
data from Redis to MongoDB)
Core frontend modules
rfx-report: ● visualizing data in real-time● monitoring real-time eventrfx-agent: ● tracking user activity: heatmap data, ...● logging user activity to rfx-track (via network
protocol: HTTP, TCP or UDP)
What problems could be solved with RFX
1. Processing Logs: a. Pageviewb. Ad Impressionc. Click analyticsd. Heatmap User Data
2. real-time user segmentation3. react to user behaviour4. auto UX optimization
Vision for RFXhttp://engineering.adsplay.net/2015/10/08/iris-big-data-query-for-human
● Ad Click Prediction: http://research.google.com/pubs/pub41159.html ● Software Engineering for Machine Learning https://sites.google.
com/site/software4ml/accepted-papers ● Fault-tolerant and Scalable Joining of Continuous Data Streams http:
//research.google.com/pubs/pub41318.html ● Dynamic Ad Layout Revenue Optimization for Display Advertising http:
//wan.poly.edu/KDD2012/forms/workshop/ADKDD12/doc/a2.pdfBehavioral analytics http://en.wikipedia.org/wiki/Behavioral_analytics
● Real-time User Segmentation http://www.slideshare.net/Hadoop_Summit/doctor-nguyen-june27425pmroom230av2
● Implementing a real-time data pipeline https://chimpler.wordpress.com/2014/07/01/implementing-a-real-time-data-pipeline-with-spark-streaming/
● Distributed Event Processing Rule Engine http://eugenedvorkin.com/distributed-event-processing-rule-engine-with-storm-spring-and-groovy/
Research links
http://www.rfxlab.com http://engineering.adsplay.net