+
100062108 李智宇、100062116 林威宏、100062220 施閔耀
+OutlineIntroduction
Architecture of Hadoop
HDFS
MapReduce
Comparison
Why Hadoop
Conclusion
2
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+What is Hadoop ? open-source software framework
process and store big data
Easy to use and implement, economic, flexible
lots of nodes(server)
written in JAVA
free license
created by Doug Cutting and Mike Cafarella in 2005
3
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+Advantages of Interpreted Language
Cross-platform(ex: Windows, Ubuntu, Mac OS X)
smaller executable program size
easier to modify during both development and execution
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
4
+Architecture of Hadoop
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
5
+Hadoop in Enterprise
6
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
The Dell representation of the Hadoop ecosystem.
+Hadoop in Enterprise
7
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+Who is using Hadoop ?
more than half of the Fortune 50 uses Hadoop by 2013
8
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+HDFSHadoop Distributed File System
Client: user
name node: manage and store metadata, namespace of files
Data node: store files
each data node sends its status to name node periodically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
9
+HDFS: Writing data in HDFS Each file will be divided into blocks(in size 64
or 128MB) , and have three copies in different data nodes.
Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one , then the data node will send the file to the rest node.
When above operation done, data node will send “done” to name node.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
10
+HDFS: Reading data in HDFSClient send filename to the name
node , then the name node will send a list of the blocks of files sorted by distance.
Client use the list to get the file from data node.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
11
+HDFS: failurenode failure
communication failure
data corruption
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
12
+HDFS: handle failureHandle writing failure:
name node will skip the data node without an ACK.
Handle reading failure:recall that when reading a file, client will get a list of data node content the file.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
13
+HDFS: handle failureName node handle node failure :
name node will find out the data the failure node have, and copy those data from others and restore them to other data node.
Note that HDFS can’t guarantee at least one copy of data is alive.
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
14
+MapReducesimilar to divide-and-conquer
First, use “Map” to divide tasks
Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “
Third, use “Reduce” to “execute the user-defined reduce function to produce the final output data. “
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
15
+MapReduce-Map
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
16
+MapReduce-shuffle
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
17
+MapReduce-Reduce
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
18
+MapReduce
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
19
+Comparison
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
20
+Comparison
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
21
+Why Hadoop?technically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
22
Comparison of Grep Task Result with Vertica and DBMS-X
+Why Hadoop?
Simple structure vs. Optimization
Transaction time not minimized
Lower performance with same number of nodes
No compelling reason to choose Hadoop
technically
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
23
+Why Hadoop?commercially
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
24
+Why Hadoop
Cheap (Buy more servers to beat DBMS)
Flexible (Both in design and deployment)
Easier to design
Easier to scale up
Combine with other system to achieve better performance
commercially
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
25
+ConclusionHadoop is much easier for users to
implement and more economic
MapReduce advocates should study the techniques used in parallel DBMSs
Hybrid systems are also popular
With improvement of performance, we believe Hadoop will lead the trend of big data computing
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
26
+Reference http://hadoop.apache.org/
http://www.runpc.com.tw/content/cloud_content.aspx?id=105318
http://en.wikipedia.org/wiki/Apache_Hadoo
https://www.facebookbrand.com/
http://assets.fontsinuse.com/static/use-media-items/15/14246/full-2048x768/522903b7/Yahoo_Logo.png
http://wiki.apache.org/hadoop/PoweredBy
http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg
http://www.conceptcupboard.com/blog/wp-content/uploads/2013/09/google.jpg
27
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+Reference http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg
http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_York_Times_logo.png
http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/hadoop-introduction.pdf
http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitecture.pdf
http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud.org%2Fcloud%2Fraw-attachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf&ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8v3_kuTYg
28
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+Reference http://www.accenture.com/SiteCollectionDocuments/PDF/
Accenture-Hadoop-Deployment-Comparison-Study.pdf
https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web&cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu%2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMING.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLal-tvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg
https://www.cs.duke.edu/starfish/files/hadoop-models.pdf
http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoop-mapreduce.html
http://wiki.apache.org/hadoop/HDFS
http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html
29
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
+Reference http://en.wikipedia.org/wiki/Interpreted_language
A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden
http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.htm
http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf
http://www.mobilemartin.com/mobile/show-me-the-mobile-money.jpg
100062108 李智宇、 100062116 林威宏、 100062220 施閔耀
30
Top Related