MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...

33
MapReduce Simplified Data Processing on Large Clusters P76001027 謝謝謝 P76011284 謝謝謝 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay Ghemawat

Transcript of MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...

Page 1: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

MapReduce : Simplified Data Processing on Large Clusters

P76001027謝光昱P76011284陳志豪

Operating Systems Design and Implementation 2004Jeffrey Dean, Sanjay Ghemawat

Page 2: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

2

OutlineOutline

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 3: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Introduction

Motive : Most computations are conceptually ordinary.The input data is usually large and the computation have to finish in a reasonable amount of time.

Problem : The following reasons obscure the computation with large amounts of complex code.

Parallelize the computation

Distribute the data

Handle failures

Solve : Design a new abstraction - MapReduce3

Page 4: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

4

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 5: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Programming Model

The computation takes a set of input key/value pairs, and produces a set of output key/value pairs.

Map:Takes an input pairs and produces a set of intermediate key/value pairs.

Groups together all intermediate key I and passes them to the Reduce function.

Reduce:Accepts an intermediate key I and a set of values for that key.

Merges together these values to form a possibly smaller set of values.

5

Page 6: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Programming Model

Example – WordCountmap(String key,String value):

//key: document name

//value: document contents

for each word w in value:

EmitIntermediate(w,”1”);

reduce(String key,Iterator values)

//key: a word

//values: a list of counts

for each v in values:

result +=ParseInt(v);

Emit(AsString(result));

6

ProgrammingProgramming

MapMap MapMap

ReduceReduce

<P.”1”><r.”1”>

<o.”1”><g.”1”>

<r.”1”><a.”1”>

<m.”1”><m.”1”>

<i.”1”><n.”1”>

<g.”1”>

<P.”1”> <a.”1”> <g.”2”> <i.”1”><m.”2”><n.”1”> <o.”1”> <r.”2”>

Page 7: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

7

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 8: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Execution Overview

Input data

Implementation

8

User Program

Master

Worker

Worker

Worker

Split 0

Split 1

Split 2

Split 3

Split 4

Worker

Worker

Output file 0

Output file 1

Input files Map phase Intermediate files(on local disks)

Reduce phase Output files

fork fork fork

Assign map Assign reduce

read local write remote readwrite

Page 9: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Implementation

Fault ToleranceWorker Failure

Any completed map task by the worker are reset back to idle state, and then scheduled on the other worker.

9

Worker Worker Worker

MasterMaster

Ping PingPingresponse response

Failed

Page 10: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Implementation

Master FailureThe master write periodic checkpoints of the master data structures, therefore a new copy can restart from the last checkpoint state.

If there is only a single master, its failure is unlikely.

10

Page 11: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

LocalityIn order to conserve network bandwidth, we store the input data on the local disk.

Implementation

11

64MB 64MB …… 64MB…… ……

Worker Worker Worker Worker Worker

Master

Input files

Map phrase

copy copy copy

Page 12: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Implementation

Backup TasksThe total time is lengthened by stragglers.

Straggler : a machine takes an unusually long time to complete one of the last tasks in the computation.

E.g. a machine with a bad disk may slow its read performance.

When a MapReduce operation is closed to completion, the master schedules backup executions of the remaining in-progress tasks.

The task is marked as completed whenever either the primary or the backup execution completes.

12

Page 13: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

13

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 14: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

Partitioning functionData gets partitioned across these tasks using a partitioning function on the intermediate key.

Default partitioning function is hashingE.g. hash(key) mod R

14

Page 15: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

Combiner FunctionThere is significant repetition in the internediate key.

Zipf distribution

E.g. <the,”1”>

<of,”1”>

The only difference between a reduce function and a combiner function is the output of them.

15

a b b a c c b c

Worker

Worker

Combiner Combiner

Worker

<a,”1”> <b,”1”>

<b,”1”> <a,”1”>

<c,”1”> <c,”1”>

<b,”1”> <c,”1”>

<a,”2”>

<b,”2”>

<b,”1”>

<c,”3”>

<a,”2”> <b,”3”> <c,”3”>

Page 16: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

Skipping Bad RecordsSome bugs that cause the Map or Reduce function to crash and prevent a MapReduce operation from completing.

Sometimes fixing the bugs is not feasible.The bug is in an unavailable source code.

Iit is acceptable to ignore a few records.

E.g. statistical analysis

MethodEach worker process installs a signal handler to catch segmentation violation and bus error.

If there are more than one failure on a particular record, it should be skipped when the master issues the next-execution.

16

Page 17: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

Status InformationThe master runs an internal HTTP server and exports a set of status pages for human consumption.

17

Page 18: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

18

Page 19: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

19

Page 20: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Refinements

CountersThe MapReduce library provides a counter facility to count occurrences of various events.

E.g. user code may want to count total number of words processed

20

Page 21: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

21

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 22: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

Cluster ConfigurationApproximately 1800 machines

Each machine had:Two 2GHz Intel Xeon processors with Hyper-Threading enable

4GB of memory

Two 160GB IDE disks

A gigabit Ethernet link

The machines were arranged in a two-level tree-shaped switched network

22

Page 23: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

GrepScan through 1010 100-byte records , searching for a relatively rare three-character pattern.

The input is split into approximately 64MB pieces(M=15000), and the entire output is placed in one file (R=1)

23

1764 workers

0 60 150

overhead execution

Propagation of the program to workersInteracting with GFS

Propagation of the program to workersInteracting with GFS

Page 24: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

SortThe program sorts 1010 100-byte records.

The input data is split into 64MB pieces(M=15000), and the sort output will be partition into 4000 files(R=4000).

24

Page 25: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

25

The input rate is less than for grepThe input rate is less than for grep

The shuffling starts as soon as the first map task completes.

Remaining reduce tasks

Remaining reduce tasks

850delay

The first batch of reduce tasksThe first batch of reduce tasks

Input rate > shuffle rate > output rate

Write two copies for reliability and availability

Write two copies for reliability and availability

Locality optimization

300 600

Page 26: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

26

1283960

StragglersStragglers

Increas

e of

44%!!

Page 27: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Performance

27

Worker deathWorker death

Re-executionRe-execution

890

Only

increase of

5%

Page 28: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

28

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 29: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Experience

Broad applications in google:large-scale machine learning problems

clustering problems for the Google News and Froogle products

extraction of data used to produce reports of popular queries

extraction of properties of web pages for new experiments and products

large-scale graph computations

29

Page 30: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Experience

Large-Scale IndexingRewrite the production indexing system that produces the data structures used for the Google web search service.

Benefits of using MapReduce:The indexing code is simpler, smaller, and easier to understand.

E.g. 3800 lines of C++ to 700 lines of MapReduce

The performance of the MapReduce library is good enough to change the indexing process easily.

It’s easier to add new machines to the indexing cluster.

30

Page 31: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

31

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Page 32: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Conclusions

The reasons of the MapReduce programming model has been successfully for many different purposes.

The model is easy to use

A large variety of problems are easily expressible

Develop an implementation of MapReduce that Scales to large clusters of machines comprising thousands of machines

Redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss

32

Page 33: MapReduce : Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.

Thank you!!

33