متنبازسازی کلانداده
Transcript of متنبازسازی کلانداده
Open Sourcing Big Data
Hadi
Sotudeh
About Us
Dr. Sharif
Big Data: From a Business & Managerial Perspective
Bigdata.blog.ir
About Us
Torob.ir Co-Founder : Ali Babei
About Us
B.S Project : (DRPC)Distributed Real Time Processing Crawler using Apache Storm
Dr. Goudarzi
Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
Dan Ariely
News
Big Data Definition
Is there any standard definition?
Big Data Definitions Gartner Mckinsey ….
Gartner Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Mckinsey datasets whose size is beyond the ability of
typical database software tools to capture,
store, manage, and analyze
• Sensors
• Transactions
• GPS
• Social Network
• Sound Files
• Video
• Image
• Telescope
• Log
• Tex
• ....
Data Sources
Tim Berners Lee
Open Data Movement
Open Data:
19
State/Org Website
UAE http://government.ae/web/guest/uae-data
UK http://data.gov.uk
US http://data.gov
World Bank http://data.worldbank.org/
India http://data.gov.in
Russia http://opengovdata.ru
EU Open-data.Europa.eu/en/data
• Google.com/trends/explore
• Google.com/finance
20
Close Data!
23
شبکه های اجتماعی
24
A Tweet
Edward Snowden
NSA
Log or Dark Data
34
35
Importance
Analytics is the discovery and communication ofmeaningful patterns in data
Analytics
Types of Analytics Cube Analytics Multi Dimensional Product Date Price
BI Predictive Analytics Statistics and Machine Learning Linear Regression Data Clustering Find Association
Dimensions of Analytics Variants
Real Time Ability to Analyze the data instantly
Batch Ability to provide insights after several
hours/days when a query is posted
TOOLS
Do It
Real Time
Problems
Scaling is painfulPoor fault-tolerance
Coding is tedious
What We Want
Guaranteed Data ProcessingHorizontal scalabilityFault-tolerance“just works”
What Is The Key?
Hadoop
Batch Oriented System
Storm
Guaranteed Data ProcessingHorizontal scalabilityFault-tolerance“just works”
Use cases
Streams
Spouts
Bolts
Topology
Word Count
Tuple Tree
Resources
Book Apache Storm website
Conclusion
• Data, Data, and Data
• Data Gathering
• Analytics
• Visualization
• Action
• Bottleneck is Creativity not Technology
• Discover Use Cases