MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...
-
Upload
olivia-parsons -
Category
Documents
-
view
218 -
download
0
Transcript of MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...
MapReduce : Simplified Data Processing on Large Clusters
P76001027謝光昱P76011284陳志豪
Operating Systems Design and Implementation 2004Jeffrey Dean, Sanjay Ghemawat
2
OutlineOutline
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Introduction
Motive : Most computations are conceptually ordinary.The input data is usually large and the computation have to finish in a reasonable amount of time.
Problem : The following reasons obscure the computation with large amounts of complex code.
Parallelize the computation
Distribute the data
Handle failures
Solve : Design a new abstraction - MapReduce3
4
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Programming Model
The computation takes a set of input key/value pairs, and produces a set of output key/value pairs.
Map:Takes an input pairs and produces a set of intermediate key/value pairs.
Groups together all intermediate key I and passes them to the Reduce function.
Reduce:Accepts an intermediate key I and a set of values for that key.
Merges together these values to form a possibly smaller set of values.
5
Programming Model
Example – WordCountmap(String key,String value):
//key: document name
//value: document contents
for each word w in value:
EmitIntermediate(w,”1”);
reduce(String key,Iterator values)
//key: a word
//values: a list of counts
for each v in values:
result +=ParseInt(v);
Emit(AsString(result));
6
ProgrammingProgramming
MapMap MapMap
ReduceReduce
<P.”1”><r.”1”>
<o.”1”><g.”1”>
<r.”1”><a.”1”>
<m.”1”><m.”1”>
<i.”1”><n.”1”>
<g.”1”>
<P.”1”> <a.”1”> <g.”2”> <i.”1”><m.”2”><n.”1”> <o.”1”> <r.”2”>
7
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Execution Overview
Input data
Implementation
8
User Program
Master
Worker
Worker
Worker
Split 0
Split 1
Split 2
Split 3
Split 4
Worker
Worker
Output file 0
Output file 1
Input files Map phase Intermediate files(on local disks)
Reduce phase Output files
fork fork fork
Assign map Assign reduce
read local write remote readwrite
Implementation
Fault ToleranceWorker Failure
Any completed map task by the worker are reset back to idle state, and then scheduled on the other worker.
9
Worker Worker Worker
MasterMaster
Ping PingPingresponse response
Failed
Implementation
Master FailureThe master write periodic checkpoints of the master data structures, therefore a new copy can restart from the last checkpoint state.
If there is only a single master, its failure is unlikely.
10
LocalityIn order to conserve network bandwidth, we store the input data on the local disk.
Implementation
11
64MB 64MB …… 64MB…… ……
Worker Worker Worker Worker Worker
Master
Input files
Map phrase
copy copy copy
Implementation
Backup TasksThe total time is lengthened by stragglers.
Straggler : a machine takes an unusually long time to complete one of the last tasks in the computation.
E.g. a machine with a bad disk may slow its read performance.
When a MapReduce operation is closed to completion, the master schedules backup executions of the remaining in-progress tasks.
The task is marked as completed whenever either the primary or the backup execution completes.
12
13
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Refinements
Partitioning functionData gets partitioned across these tasks using a partitioning function on the intermediate key.
Default partitioning function is hashingE.g. hash(key) mod R
14
Refinements
Combiner FunctionThere is significant repetition in the internediate key.
Zipf distribution
E.g. <the,”1”>
<of,”1”>
The only difference between a reduce function and a combiner function is the output of them.
15
a b b a c c b c
Worker
Worker
Combiner Combiner
Worker
<a,”1”> <b,”1”>
<b,”1”> <a,”1”>
<c,”1”> <c,”1”>
<b,”1”> <c,”1”>
<a,”2”>
<b,”2”>
<b,”1”>
<c,”3”>
<a,”2”> <b,”3”> <c,”3”>
Refinements
Skipping Bad RecordsSome bugs that cause the Map or Reduce function to crash and prevent a MapReduce operation from completing.
Sometimes fixing the bugs is not feasible.The bug is in an unavailable source code.
Iit is acceptable to ignore a few records.
E.g. statistical analysis
MethodEach worker process installs a signal handler to catch segmentation violation and bus error.
If there are more than one failure on a particular record, it should be skipped when the master issues the next-execution.
16
Refinements
Status InformationThe master runs an internal HTTP server and exports a set of status pages for human consumption.
17
Refinements
18
Refinements
19
Refinements
CountersThe MapReduce library provides a counter facility to count occurrences of various events.
E.g. user code may want to count total number of words processed
20
21
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Performance
Cluster ConfigurationApproximately 1800 machines
Each machine had:Two 2GHz Intel Xeon processors with Hyper-Threading enable
4GB of memory
Two 160GB IDE disks
A gigabit Ethernet link
The machines were arranged in a two-level tree-shaped switched network
22
Performance
GrepScan through 1010 100-byte records , searching for a relatively rare three-character pattern.
The input is split into approximately 64MB pieces(M=15000), and the entire output is placed in one file (R=1)
23
1764 workers
0 60 150
overhead execution
Propagation of the program to workersInteracting with GFS
Propagation of the program to workersInteracting with GFS
Performance
SortThe program sorts 1010 100-byte records.
The input data is split into 64MB pieces(M=15000), and the sort output will be partition into 4000 files(R=4000).
24
Performance
25
The input rate is less than for grepThe input rate is less than for grep
The shuffling starts as soon as the first map task completes.
Remaining reduce tasks
Remaining reduce tasks
850delay
The first batch of reduce tasksThe first batch of reduce tasks
Input rate > shuffle rate > output rate
Write two copies for reliability and availability
Write two copies for reliability and availability
Locality optimization
300 600
Performance
26
1283960
StragglersStragglers
Increas
e of
44%!!
Performance
27
Worker deathWorker death
Re-executionRe-execution
890
Only
increase of
5%
28
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Experience
Broad applications in google:large-scale machine learning problems
clustering problems for the Google News and Froogle products
extraction of data used to produce reports of popular queries
extraction of properties of web pages for new experiments and products
large-scale graph computations
29
Experience
Large-Scale IndexingRewrite the production indexing system that produces the data structures used for the Google web search service.
Benefits of using MapReduce:The indexing code is simpler, smaller, and easier to understand.
E.g. 3800 lines of C++ to 700 lines of MapReduce
The performance of the MapReduce library is good enough to change the indexing process easily.
It’s easier to add new machines to the indexing cluster.
30
31
Introduction
Programming Model
Implementation
Refinements
Performance
Experience
Conclusions
Conclusions
The reasons of the MapReduce programming model has been successfully for many different purposes.
The model is easy to use
A large variety of problems are easily expressible
Develop an implementation of MapReduce that Scales to large clusters of machines comprising thousands of machines
Redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss
32
Thank you!!
33