MapReduce ： Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...

MapReduce ： Simplified Data Processing on Large Clusters

P76001027謝光昱P76011284陳志豪

Operating Systems Design and Implementation 2004Jeffrey Dean, Sanjay Ghemawat

2

OutlineOutline

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Introduction

Motive ： Most computations are conceptually ordinary.The input data is usually large and the computation have to finish in a reasonable amount of time.

Problem ： The following reasons obscure the computation with large amounts of complex code.

Parallelize the computation

Distribute the data

Handle failures

Solve ： Design a new abstraction - MapReduce3

4

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Programming Model

The computation takes a set of input key/value pairs, and produces a set of output key/value pairs.

Map:Takes an input pairs and produces a set of intermediate key/value pairs.

Groups together all intermediate key I and passes them to the Reduce function.

Reduce:Accepts an intermediate key I and a set of values for that key.

Merges together these values to form a possibly smaller set of values.

5

Programming Model

Example – WordCountmap(String key,String value):

//key: document name

//value: document contents

for each word w in value:

EmitIntermediate(w,”1”);

reduce(String key,Iterator values)

//key: a word

//values: a list of counts

for each v in values:

result +=ParseInt(v);

Emit(AsString(result));

6

ProgrammingProgramming

MapMap MapMap

ReduceReduce

<P.”1”><r.”1”>

<o.”1”><g.”1”>

<r.”1”><a.”1”>

<m.”1”><m.”1”>

<i.”1”><n.”1”>

<g.”1”>

<P.”1”> <a.”1”> <g.”2”> <i.”1”><m.”2”><n.”1”> <o.”1”> <r.”2”>

7

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Execution Overview

Input data

Implementation

8

User Program

Master

Worker

Worker

Worker

Split 0

Split 1

Split 2

Split 3

Split 4

Worker

Worker

Output file 0

Output file 1

Input files Map phase Intermediate files(on local disks)

Reduce phase Output files

fork fork fork

Assign map Assign reduce

read local write remote readwrite

Implementation

Fault ToleranceWorker Failure

Any completed map task by the worker are reset back to idle state, and then scheduled on the other worker.

9

Worker Worker Worker

MasterMaster

Ping PingPingresponse response

Failed

Implementation

Master FailureThe master write periodic checkpoints of the master data structures, therefore a new copy can restart from the last checkpoint state.

If there is only a single master, its failure is unlikely.

10

LocalityIn order to conserve network bandwidth, we store the input data on the local disk.

Implementation

11

64MB 64MB …… 64MB…… ……

Worker Worker Worker Worker Worker

Master

Input files

Map phrase

copy copy copy

Implementation

Backup TasksThe total time is lengthened by stragglers.

Straggler ： a machine takes an unusually long time to complete one of the last tasks in the computation.

E.g. a machine with a bad disk may slow its read performance.

When a MapReduce operation is closed to completion, the master schedules backup executions of the remaining in-progress tasks.

The task is marked as completed whenever either the primary or the backup execution completes.

12

13

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Refinements

Partitioning functionData gets partitioned across these tasks using a partitioning function on the intermediate key.

Default partitioning function is hashingE.g. hash(key) mod R

14

Refinements

Combiner FunctionThere is significant repetition in the internediate key.

Zipf distribution

E.g. <the,”1”>

<of,”1”>

The only difference between a reduce function and a combiner function is the output of them.

15

a b b a c c b c

Worker

Worker

Combiner Combiner

Worker

<a,”1”> <b,”1”>

<b,”1”> <a,”1”>

<c,”1”> <c,”1”>

<b,”1”> <c,”1”>

<a,”2”>

<b,”2”>

<b,”1”>

<c,”3”>

<a,”2”> <b,”3”> <c,”3”>

Refinements

Skipping Bad RecordsSome bugs that cause the Map or Reduce function to crash and prevent a MapReduce operation from completing.

Sometimes fixing the bugs is not feasible.The bug is in an unavailable source code.

Iit is acceptable to ignore a few records.

E.g. statistical analysis

MethodEach worker process installs a signal handler to catch segmentation violation and bus error.

If there are more than one failure on a particular record, it should be skipped when the master issues the next-execution.

16

Refinements

Status InformationThe master runs an internal HTTP server and exports a set of status pages for human consumption.

17

Refinements

18

Refinements

19

Refinements

CountersThe MapReduce library provides a counter facility to count occurrences of various events.

E.g. user code may want to count total number of words processed

20

21

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Performance

Cluster ConfigurationApproximately 1800 machines

Each machine had：Two 2GHz Intel Xeon processors with Hyper-Threading enable

4GB of memory

Two 160GB IDE disks

A gigabit Ethernet link

The machines were arranged in a two-level tree-shaped switched network

22

Performance

GrepScan through 1010 100-byte records , searching for a relatively rare three-character pattern.

The input is split into approximately 64MB pieces(M=15000), and the entire output is placed in one file (R=1)

23

1764 workers

0 60 150

overhead execution

Propagation of the program to workersInteracting with GFS

Propagation of the program to workersInteracting with GFS

Performance

SortThe program sorts 1010 100-byte records.

The input data is split into 64MB pieces(M=15000), and the sort output will be partition into 4000 files(R=4000).

24

Performance

25

The input rate is less than for grepThe input rate is less than for grep

The shuffling starts as soon as the first map task completes.

Remaining reduce tasks

Remaining reduce tasks

850delay

The first batch of reduce tasksThe first batch of reduce tasks

Input rate > shuffle rate > output rate

Write two copies for reliability and availability

Write two copies for reliability and availability

Locality optimization

300 600

Performance

26

1283960

StragglersStragglers

Increas

e of

44%!!

Performance

27

Worker deathWorker death

Re-executionRe-execution

890

Only

increase of

5%

28

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Experience

Broad applications in google：large-scale machine learning problems

clustering problems for the Google News and Froogle products

extraction of data used to produce reports of popular queries

extraction of properties of web pages for new experiments and products

large-scale graph computations

29

Experience

Large-Scale IndexingRewrite the production indexing system that produces the data structures used for the Google web search service.

Benefits of using MapReduce：The indexing code is simpler, smaller, and easier to understand.

E.g. 3800 lines of C++ to 700 lines of MapReduce

The performance of the MapReduce library is good enough to change the indexing process easily.

It’s easier to add new machines to the indexing cluster.

30

31

Introduction

Programming Model

Implementation

Refinements

Performance

Experience

Conclusions

Conclusions

The reasons of the MapReduce programming model has been successfully for many different purposes.

The model is easy to use

A large variety of problems are easily expressible

Develop an implementation of MapReduce that Scales to large clusters of machines comprising thousands of machines

Redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss

32

Thank you!!

33

MapReduce ： Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...

Documents

Transcript of MapReduce ： Simplified Data Processing on Large Clusters P76001027 謝光昱 P76011284 陳志豪...