Map Reduce 〜入門編：仕組みの理解とアルゴリズムデザイン〜

Map Reduce～入門編：仕組みの理解とアルゴリズムデザイン～

第12回データマイニング+WEB ＠東京（#TokyoWebmining）

doryokujin

[名前] doryokujin ( 井上敬浩 )

[年齢] 26歳

[専攻] 数学（統計・確率的アルゴリズム）

[会社] 芸者東京エンターテインメント（GTE）

[職業] データマイニングエンジニア

[趣味] マラソン ( 42.195km: 2時間33分 )

[コミュニティ]

・MongoDB JP: もっとMongoDBを日本に！

・TokyoWebMining: 統計解析・データマイニングの各種方法論、WEB上のデータ活用に関する勉強会

・おしゃれStatistics: 名著「statistics」を読み進めながら統計を学ぶ勉強会 with @isseing333 @dichika

自己紹介

https://groups.google.com/group/mongodb-jp?hl=ja

https://groups.google.com/group/mongodb-jp?hl=ja

https://groups.google.com/group/webmining-tokyo?hl=ja

https://groups.google.com/group/webmining-tokyo?hl=ja

http://www.amazon.co.jp/Statistics-David-Freedman/dp/0393930432/ref=sr_1_1?s=english-books&ie=UTF8&qid=1305452607&sr=1-1

http://www.amazon.co.jp/Statistics-David-Freedman/dp/0393930432/ref=sr_1_1?s=english-books&ie=UTF8&qid=1305452607&sr=1-1

http://twitter.com/#!/isseing333

http://twitter.com/#!/isseing333

http://twitter.com/dichika

http://twitter.com/dichika

お知らせ(3つ)

[日時] ：2011/07/08（金）18:00 ～ 21:00

[ATND]：http://atnd.org/events/17038

・分散処理技術（NoSQL・分散FS・並列プログラミングモデル・大規模データ処理など…）に関するトピックを毎回取り上げる

・トピックに沿ったスピーカーをお呼びしてそれについての議論を座談会形式で行う勉強会

・机を円に並べて、参加者も含めて議論できるような場に

[内容]：第1回は”NoSQL”（特にスケーラビリティに関する話題を中心に）

・okuyama, HBase, Cassandra, MongoDB, Hibari のスピーカー陣！

第1回分散処理ワークショップ

http://atnd.org/events/17038

http://atnd.org/events/17038

[日時] ：2011/07/31（金）14:00 ～ 19:00

[ATND]：http://atnd.org/events/17136

[参加者特典]

・参加者にMongoDBノベルティグッズ（たぶんマグカップ）をプレゼント

[内容]

・仕組みや機能の説明

・実際の運用事例など…

発表者募集中です → Twitter: @doryokujin まで！

第5回MongoDB勉強会～真夏の大Mongo祭り～

祝日本語版発売：Data-Intensive Text Processing with MapReduce

・待望の日本語訳が8月～9月にオライリーから発売！

・「アルゴリズムデザイン」「転置インデックス」「グラフアルゴリズム」「EMアルゴリズム」

・Jimmiy Lin (著) Chris Dyer (著)

・玉川竜司 (訳)Hadoop 第2版も7月頃発売！

http://twitter.com/%23!/tamagawa_ryuji



















1. Big Dataを扱うための 7つのアイデア

2. Map Reduce とは

3. Map Reduce を扱う際の注意点

4. Map Reduce アルゴリズムデザイン

5. 応用編について

アジェンダ

1. Big Dataを扱うための

7つのアイデア

1. Big Dataを扱うための7つのアイデア

(1) Scale “Out”, not “Up”

(2) 障害は当たり前

(3) Shared Nothing Computing Cluster

(4) 開発者とハイレベルなオペレーションを分離

(5) Move the Code to the Data

(6) ランダムアクセスを避け、シーケンシャルアクセス

(7) シームレスなスケーラビリティ

(1) Scale “Out”, not “Up”

・多数のローエンドサーバーで構成（Scale Out）

・少数のハイエンドサーバー構成ではない（Scale Up）

・スペックの異なるマシンでも構わない

・AWSなどのクラウドサービス上でも展開可能

(2) 障害は当たり前

・マシンの寿命は1,000日（約3年）

・10,000サーバーの運用では平均10台/日の故障

・複数のサーバー障害でも処理を継続できる仕組みは必須

・障害サーバーの担っていた処理を他のサーバにシームレスに引き渡す

(3) Shared Nothing Computing Cluster

・プロセッサ、ローカルメモリ、ディスクリソースをノード間で共有しない 7

Figure 2. Shared Nothing Computing Cluster.

Common Characteristics.

There are several important common characteristics of data-intensive computing

systems that distinguish them from other forms of computing. First is the principle of

collocation of the data and programs or algorithms to perform the computation. To

achieve high performance in data-intensive computing, it is important to minimize the

movement of data. In direct contrast to other types of computing and supercomputing

which utilize data stored in a separate repository or servers and transfer the data to the

processing system for computation, data-intensive computing uses distributed data and

distributed file systems in which data is located across a cluster of processing nodes, and

instead of moving the data, the program or algorithm is transferred to the nodes with the

data that needs to be processed. This principle – “Move the code to the data” – is

extremely effective since program size is usually small in comparison to the large

Aggregate Data Analysis

http://wpc.423a.edgecastcdn.net/00423A/whitepapers/wp_aggregated_data_analysis.pdf

http://wpc.423a.edgecastcdn.net/00423A/whitepapers/wp_aggregated_data_analysis.pdf

(4) 開発者とハイレベルなオペレーションを分離

・スレッドの制御、ロック、バリア、スケジューリング、などの複雑なオペレーションを開発者から分離

・開発者は実際にどのように処理が行われているのかの詳細を知らなくても良い

・その代わりにMap Reduce というプログラミングモデルに従って処理を記述

(5) Move the Code to the Data・データは移動させず、コードをデータのある場所にコピーする

・つまりデータのある場所で処理を実行する

・ディスク、ネットワークにかかる負荷を極力抑える

Data

mapper

mapper

mapper

Data

mapper

mapper

mapper

Data

mapper

mapper

mapper

(6) ランダムアクセスを避け、シーケンシャルアクセス

・シーケンシャルな読み取りに限定して高速化

・つまり入力ファイルをの全てのレコードを読み込む

・ランダムアクセスは非常に時間がかかる

(7) シームレスなスケーラビリティ

・スケーラブルなアルゴリズム：（理想的には）

- データが2倍になれば処理時間も2倍

- サーバー数が2倍になれば処理時間は1/2

- 数台で数GBの処理も、数千台で数PBの処理も同じように


【クラウドコンピューティング】 MapReduceの復習

http://www.virtual-tech.net/blog/2008/10/mapreduce.html

http://www.virtual-tech.net/blog/2008/10/mapreduce.html

2. Word Count

!"#$%&'()*'アプリケーション開発手法"

!+,"

-'.ページの単語を抽出して、出現頻度が高い単語を抽出する。図" !+/に#$%&'()*'での処理の流れを示す。-'.ページの情報を#$%処理で0出現単語1"23とカウントする。45)667'で、出現単語を頭文字で分割・集約する。&'()*'処理で0出現単語1"23をまとめて0出現単語1"出現回数3となるように集計する。"

"

Webページ

Welcome to My HomePage.Thank you.

Where is your house? ....

Map処理

<welcome, 1><homepage, 1><you, 1><go, 1><where, 1><your, 1><house, 1><homepage, 1>・・・

Shuffle

<welcome, 1><welcome, 1><where, 1>・・・

<homepage, 1><homepage, 1><house, 1>・・・

Reduce処理

<go, 2><homepage, 10><house, 3><welcome, 8><where, 7><you, 4><your, 5>・・・

<you, 1><your, 1><your, 1>・・・

単語を抽出し<単語, 1>と扱う

頭文字単位で分割する単語単位で集計する

Webページ

Welcome to My HomePage.Thank you.

Where is your house? ....

Map処理

<welcome, 1><homepage, 1><you, 1><go, 1><where, 1><your, 1><house, 1><homepage, 1>・・・

Shuffle

<welcome, 1><welcome, 1><where, 1>・・・

<homepage, 1><homepage, 1><house, 1>・・・

Reduce処理

<go, 2><homepage, 10><house, 3><welcome, 8><where, 7><you, 4><your, 5>・・・

<you, 1><your, 1><your, 1>・・・

単語を抽出し<単語, 1>と扱う

頭文字単位で分割する単語単位で集計する "図" !+/"-'.ページの頻出キーワード抽出フロー"

"この処理を応用することで、以下の処理も#$%&'()*'で実現できる。"

-'.ページの属性抽出" 8"出現単語に重みをつけ出現割合から属性を決める" アクセスログ解析" 8"どの-'.ページにアクセスが多いか分析する"

"9/: メールフィルタリング"

メールログより、スパムメールを送信するメールアドレスのリストを作成す

るために#$%&'()*'を適用する。図" !+;"に示すように、#$%処理では、メール 2通単位でメールアドレスやヘッダ・本文と言ったメールデータを評価関数に適用し、スパムメール指数を算出する。&'()*'処理では、メールアドレス単位でのスパムメール指数を集計し、閾値によりフィルタ対象のメール

アドレスを決定する。""

Map Reduce の構成[1] Split フェーズ

[2] Map & Serialize フェーズ

[3] Combine フェーズ

[4] Partition & Shuffle フェーズ

[5] Sort フェーズ

[6] Reduce フェーズ

（注）一般的に[1]やSerializeフェーズはMRに含まれない

reducer reducer

mapper

combiner

mapper

combiner

k1,[v1] k2,[v2] k1,[v1] k2,[v2]

k1,[v1]

k2,[v2]

k3,[v3] k4,[v4] k3,[v3] k4,[v4]

k3,[v3]

k4,[v4]

partitionershuffle

sort sort

[1] Split フェーズ[Split フェーズ]

・複数のmap処理を可能にするためデータを分割

・Hadoopではデフォルトで64MBのブロックに分割され、mapperに渡される

Big Data mapper

mapper

mapper

[2] Map & Serialize フェーズ[Map フェーズ]

・各入力から組 (key, value) を作成

map: (k1, v1) ! [(k2, v2)] // []はリスト

//word count

class Mapper

method Map(docid a, doc d)

for all term t ∈ doc d do

Emit(term t, count 1)

[2] Map & Serialize フェーズ[Serialize フェーズ]

・value のデータ型は自由（整数、文字列、配列、ハッシュ…）

・複雑な構造を持て、かつネットワーク転送コスト低：

- Avro

- Thrift

- Message Pack

- Protocol Buffers

> require 'msgpack' > msg = [1,2,3].to_msgpack #=>"\x93\x01\x02\x03" > MessagePack.unpack(msg) #=> [1,2,3]

http://avro.apache.org/

http://avro.apache.org/

http://thrift.apache.org/

http://thrift.apache.org/

http://msgpack.org/

http://msgpack.org/

http://code.google.com/intl/ja/apis/protocolbuffers/docs/overview.html

http://code.google.com/intl/ja/apis/protocolbuffers/docs/overview.html

[3] Combine フェーズ[Combine フェーズ] ( = Local Aggregation フェーズ )

・Map 処理結果に対してローカルでReduce処理を行う

// word count

class Combiner

method Combine(string t, counts [c1, c2, . . .])

sum ← 0

for all count c ∈ counts [c1, c2, . . .] do

sum ← sum + c

Emit(string t, count sum)

[4] Partition & Shuffle フェーズ[Shuffle 処理]

・Mapの出力をReduce処理へ引き渡す

[Partitioner]

・KeyごとにReduceへ引き渡す際の引き渡し先を決定

・デフォルトではハッシュ値を元に決定 reducer reducer

combiner combiner

k1, v1 k2, v2 k1, v1 k2, v2

k1,[v1]

k2,[v2]

k3, v3 k4, v4 k3, v3 k4, v4

k3,[v3]

k4,[v4]

partitioner

shuffle

sort sort

mapper mapper

[5] Sort フェーズ[Sort フェーズ]

・Reduceに複数のキーが渡された場合はキーを事前にソートする

・Hadoopではvalueリストに対してのソートはしない

・Google Map Reduceは2次ソートキーを指定することで可能

reducer reducer

k1,[v1]

k2,[v2]

k3,[v3]

k4,[v4]sort sort

[6] Reduce フェーズ[Reduce フェーズ] （全てのMap処理の完了後に実行）

・keyとvalueリストに対して集約処理 reduce: (k2, [v2]) ! [(k3, v3)]

//word count

class Reducer

method Reduce(term t, counts [c1, c2, . . .])

sum ← 0

for all count c ∈ counts [c1,c2,...] do

sum ← sum + c

Emit(term t, count sum)

2. Map Combine Shuffle Reduce30 CHAPTER 2. MAPREDUCE BASICS

! " # $ % &

'())*+ '())*+ '())*+ '())*+

,( - . / /0 1 ( /2 . , /3 4

/5',67*+ /5',67*+ /5',67*+ /5',67*+

)) )) )) ))

,( - . / 8 ( /2 . , /3 4

)(+969657*+ )(+969657*+ )(+969657*+ )(+969657*+

:;<==>*?(7@?:5+9A (BB+*B(9*?C(><*D?,E?F*ED

( - 2 , . 3 / . 8 4

) ) ) )

+*@</*+ +*@</*+ +*@</*+

G 2 H 3 I 8

Figure 2.4: Complete view of MapReduce, illustrating combiners and partitioners in addi-

tion to mappers and reducers. Combiners can be viewed as “mini-reducers” in the map phase.

Partitioners determine which reducer is responsible for a particular key.

a combiner can significantly reduce the amount of data that needs to be copied over

the network, resulting in much faster algorithms.

The complete MapReduce model is shown in Figure 2.4. Output of the mappers

are processed by the combiners, which perform local aggregation to cut down on the

number of intermediate key-value pairs. The partitioner determines which reducer will

be responsible for processing a particular key, and the execution framework uses this

information to copy the data to the right location during the shuffle and sort phase.13

Therefore, a complete MapReduce job consists of code for the mapper, reducer, com-

biner, and partitioner, along with job configuration parameters. The execution frame-

work handles everything else.

13In Hadoop, partitioners are actually executed before combiners, so while Figure 2.4 is conceptually accurate,

it doesn’t precisely describe the Hadoop implementation.

reducer reducer

k1,[v1]

k2,[v2]

k3,[v3]

k4,[v4]

partitioner

mapper

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]

shuffle

sort sort

mapper

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]

mapper

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]

mapper

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]


[Shuffleフェーズでの通信量の削減]

・ローカル処理の中でいかにサイズを小さくしておけるか

・Mapper と Combiner 処理をできるだけ頑張る

→ 4. Map Reduce アルゴリズムデザインで紹介


[Reduce間での処理量の偏り]

・デフォルトのPartitionerはキーのハッシュ値でReduce引き渡し先決定

・valueリストのサイズを考慮しない

・valueリストのサイズがキーによって差が大きい場合はreducer間で処理量（負荷）が偏る可能性

・独自のPartitionerを定義して防ぐ

[Reduce間での処理量の偏り]：word countの例

・デフォルトではReduceにキーの数の意味では均質

・value（occurrence）の総和の意味では均質ではないHadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning

http://philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/



・wordの出現分布はほぼジップの法則に従う

・i.e. 出現頻度がk 番目に大きい要素が全体に占める割合が1/k に比例する

[ZIP分布]

・左図は頻度に対数を取ったもの

・wordの出現頻度は1,2,3のものが非常に多い

Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning






・valueの総和の意味で均質なパーティショニング例：

・独自のPartitionerを定義するHadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning



package com.philippeadjiman.hadooptraining; package com.philippeadjiman.hadooptraining;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapred.JobConf;import org.apache.hadoop.mapred.Partitioner; public class MyPartitioner implements Partitioner<IntWritable,Text> {" @Override" public int getPartition(IntWritable key, Text value, int numPartitions) {" " /* Pretty ugly hard coded partitioning function. Don't do that in practice, it is just for the sake of understanding. */" " int nbOccurences = key.get(); " " if( nbOccurences < 3 )" " " return 0;" " else" " " return 1;" } " @Override" public void configure(JobConf arg0) { " }}


・例えば頻度（valueリストのサイズ）の閾値を3として分類

Hadoop Tutorial Series, Issue #2: Getting Started With (Customized) Partitioning



[combinerファンクション]

・Combiner ファンクションは以下の2つの性質を満たしていないといけない：

(a) 可換（Commutative）

(b) 結合的（Associative）

[例]：Sumは(a)(b)を満たす。Meanは(b)を満たさない

x + y = y + x

x ･ y = y ･ x

(x + y) + z = x + (y + z)

(x ･ y) ･ z = x ･ (y ･ z)



ここから擬似コードのスタイルが変わっています、すいません…

(1) Local Aggregation

(2) Sorting

(3) Simple Statistics

(4) Sampling

(5) Continuous Map Reduce

(6) Join


(1) Local Aggregation[3つのLocal Aggregation]

[a] In-Mapper Combine

- mapper 内でのcombine

[b] Combine

- combine処理

[c] Local Combine

- 同一サーバー内でのcombine

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]

mappermappermapper

combiner combiner

[c] Local Combine

(1)[a] In-Mapper Combine

[In-Mapper Combining]

・連想配列などを利用

・各Mapperの許容メモリ容量に達するまで集計を続け、達した時点でemitする

(1)[a] In-Mapper Combine class Mapper {

buffer

init() {

buffer = HashMap.new

}

map(id, data) {

elements = process(data)

for each element {

....

check_and_put(buffer, k2, v2)

}

} //続く Designing algorithms for Map Reduce

http://horicky.blogspot.com/2010/08/designing-algorithmis-for-map-reduce.html


check_and_put(buffer, k2, v2) {

if buffer.full {

for each k2 in buffer.keys {

emit(k2, buffer[k2])

}

} else {

buffer.incrby(k2, v2) // H[k2]+=v2

}

}

close() {

for each k2 in buffer.keys {

emit(k2, buffer[k2])

}

}

} Designing algorithms for Map Reduce



(1)[a]+[b] に分解可能な条件

!

"

!#

#

$

%

!

"

!#

#

$

%

!

"

!"#

$%&'(%)*'+

!+(,+

-(.*#/0

1+2*3+

4.5&*6+(

6"#

(+2*

3+

Figure 1: Distributed execution plan for MapReducewhen reduce cannot be decomposed to perform partialaggregation.

With this user-defined function, and merge and

grouping operators provided by the system, it is pos-

sible to execute a simple distributed computation as

shown in Figure 1. The computation has exactly

two phases: the first phase executes a Map function

on the inputs to extract keys and records, then per-

forms a partitioning of these outputs based on the

keys of the records. The second phase collects and

merges all the records with the same key, and passes

them to the Reduce function. (This second phase is

equivalent to GroupBy followed by Aggregate in the

database literature.)

As we shall see in the following sections, many op-

timizations for distributed aggregation rely on com-

puting and combining “partial aggregations.” Sup-

pose that aggregating the sequence Rk of all the

records with a particular key k results in output Sk.

A partial aggregation computed from a subsequence

r of Rk is an intermediate result with the property

that partial aggregations of all the subsequences of

Rk can be combined to generate Sk. Partial aggrega-

tions may exist, for example, when the aggregation

function is commutative and associative, and Sec-

tion 2.2 below formalizes the notion of decompos-

able functions which generalize this case. For our

running example of integer average, a partial aggre-

gate contains a partial sum and a partial count:

struct Partial {int partialSum;int partialCount;

}

Often the partial aggregation of a subsequence ris much smaller than r itself: in the case of aver-

age for example the partial sum is just two values,

!

"#

$

!

"#

$

!

"#

$

!"

"%

&

!"

"%

&

!"#$%&'

()*+*,-./0$1/

2/"3/

!"#$%&'

4#56*)/

2/"3/

!"#$%&'

7*),-./0$1/

4#)8$5/"

5,%

,33"/3,+*#)9+"//

"/0$

1/

'( '('(

!"

"%

)

*

!"

"%

)

*

2,%

:*8+"*6$+/

Figure 2: Distributed execution plan for MapReducewhen reduce supports partial aggregation. The imple-mentation of GroupBy in the first stage may be different tothat in the later stages, as discussed in Section 5.

regardless of the number of integers that have been

processed. When there is such substantial data re-

duction, partial aggregation can be introduced both

as part of the initial Map phase and in an aggre-

gation tree, as shown in Figure 2, to greatly reduce

network traffic. In order to decompose a user-defined

aggregation using partial aggregation it is necessary

to introduce auxiliary functions, called “Combiners”

in [9], that synthesize the intermediate results into

the final output. The MapReduce system described

in [9] can perform partial aggregation on each local

computer before transmitting data across the net-

work, but does not use an aggregation tree.

In order to enable partial aggregation a user of

MapReduce must supply three functions:

1. InitialReduce: �K,Sequence of R� → �K, X�which takes a sequence of records of type R, all

with the same key of type K, and outputs a

partial aggregation encoded as the key of type

K and an intermediate type X.

2. Combine: �K,Sequence of X� → �K, X� which

takes a sequence of partial aggregations of type

3

!

"

!#

#

$

%

!

"

!#

#

$

%

!

"

!"#

$%&'(%)*'+

!+(,+

-(.*#/0

1+2*3+

4.5&*6+(

6"#

(+2*

3+Figure 1: Distributed execution plan for MapReducewhen reduce cannot be decomposed to perform partialaggregation.

With this user-defined function, and merge and

grouping operators provided by the system, it is pos-

sible to execute a simple distributed computation as

shown in Figure 1. The computation has exactly

two phases: the first phase executes a Map function

on the inputs to extract keys and records, then per-

forms a partitioning of these outputs based on the

keys of the records. The second phase collects and

merges all the records with the same key, and passes

them to the Reduce function. (This second phase is

equivalent to GroupBy followed by Aggregate in the

database literature.)

As we shall see in the following sections, many op-

timizations for distributed aggregation rely on com-

puting and combining “partial aggregations.” Sup-

pose that aggregating the sequence Rk of all the

records with a particular key k results in output Sk.

A partial aggregation computed from a subsequence

r of Rk is an intermediate result with the property

that partial aggregations of all the subsequences of

Rk can be combined to generate Sk. Partial aggrega-

tions may exist, for example, when the aggregation

function is commutative and associative, and Sec-

tion 2.2 below formalizes the notion of decompos-

able functions which generalize this case. For our

running example of integer average, a partial aggre-

gate contains a partial sum and a partial count:

struct Partial {int partialSum;int partialCount;

}

Often the partial aggregation of a subsequence ris much smaller than r itself: in the case of aver-

age for example the partial sum is just two values,

!

"#

$

!

"#

$

!

"#

$

!"

"%

&

!"

"%

&

!"#$%&'

()*+*,-./0$1/

2/"3/

!"#$%&'

4#56*)/

2/"3/

!"#$%&'

7*),-./0$1/

4#)8$5/"

5,%

,33"/3,+*#)9+"//

"/0$

1/

'( '('(

!"

"%

)

*

!"

"%

)

*

2,%

:*8+"*6$+/

Figure 2: Distributed execution plan for MapReducewhen reduce supports partial aggregation. The imple-mentation of GroupBy in the first stage may be different tothat in the later stages, as discussed in Section 5.

regardless of the number of integers that have been

processed. When there is such substantial data re-

duction, partial aggregation can be introduced both

as part of the initial Map phase and in an aggre-

gation tree, as shown in Figure 2, to greatly reduce

network traffic. In order to decompose a user-defined

aggregation using partial aggregation it is necessary

to introduce auxiliary functions, called “Combiners”

in [9], that synthesize the intermediate results into

the final output. The MapReduce system described

in [9] can perform partial aggregation on each local

computer before transmitting data across the net-

work, but does not use an aggregation tree.

In order to enable partial aggregation a user of

MapReduce must supply three functions:

1. InitialReduce: �K,Sequence of R� → �K, X�which takes a sequence of records of type R, all

with the same key of type K, and outputs a

partial aggregation encoded as the key of type

K and an intermediate type X.

2. Combine: �K,Sequence of X� → �K, X� which

takes a sequence of partial aggregations of type

3


・In-Mapper CombineとCombineを併用すれば効率良く集

計処理が行える

・Mapper -> Reducer集計処理からこのような処理系へ分

解できるかどうかは次の性質を見ることで確認できる

Def. 1

x: data items, x1 ⊕ x2: concatenation of x1, x2.

「関数 H が decomposable」であるとは2つの関数 I と C が以下を満

たす時を言う:

1) ∀x1, x2 : H(x1 ⊕ x2) = C(I(x1 ⊕ x2)) = C(I(x1) ⊕ I(x2))

2) ∀x1, x2 : I(x1 ⊕ x2) = I(x2 ⊕ x1)

3) ∀x1, x2 : C(x1 ⊕ x2) = C(x2 ⊕ x1)

Def. 2

「関数 H が associative-decomposable である」とは、Def.1 の

1-3 を満たし、かつ C が以下を満たす時を言う：

4) ∀x1, x2, x3 : C(C(x1 ⊕ x2) ⊕ x3) = C(x1 ⊕ C(x2 ⊕ x3))

( i.e. C is associative )


[分解可能]

・集約計算: H が “associative-decomposable”な関数の集合で表現できる場合、処理プランをIn-Mapper Combine: IとCombine: Cに分解可能

・前スライドの関数 I が Initial-Reduce（=In-Mapper Combine）、C が Combine を意味する。

(1)[c] Local Combine[Local Combine]・ローカルサーバー内のMap処理の出力を集約

・基本的にローカル内でもMapper間の通信、データの共有はできない

・共有メモリ（ディスク）スペースを確保する

・Redis のハッシュ型、　セット型の集約関数　を利用するなど

combiner

k1,[v1] k2,[v2]

k3,[v3] k4,[v4]

mappermappermapper

combiner combiner

(1)[c] Local Combine class Combiner {

share_space

init(share_space_info) {

share_space = conn(share_space_info)

}

combine(key, elements) {

sum = 0

for each element {

...

sum += v

} //続く

share_space.incrby(key, sum)

emit(key, share_space_info)

} // end combine()

}

class Reducer {

reduce(key, list_of_share_space_info) {

for each share_space_info {

share_space = conn(share_space_info)

sum = 0

elements = share_space.hget(key)

for each elemnt {

...

}

}

}

(2) Sorting[Tera Sort]: TeraByte Sort on Apache Hadoop

・1Reducer の設定なら自動でソートしてくれる

・複数のReducerで処理を分散させたい→Self Partitioner

・キーを特定の範囲でにPartition、各々の範囲内でソート

partition(key) { range = (KEY_MAX - KEY_MIN) / NUM_OF_REDUCERS reducer_no = (key - KEY_MIN) / range return reducer_no }

k0 in [KEY_MIN, r0) k1 in [r0, r1) kn in [rn, KEY_MAX]

Designing algorithms for Map Reduce

http://sortbenchmark.org/YahooHadoop.pdf

http://sortbenchmark.org/YahooHadoop.pdf



(2) Sorting[Secondary Sort]

・例：センサーデータ

・t: timestamp、m: sencor id、r：sensor reading

・センサーごとに、かつタイムスタンプでソートしたものをReducerで受け取りたい、以下ではうまくいかない：

(t1, m1, r80521), (t1, m2, r14209), (t1, m3, r76042),(t2, m1, r21823), (t2, m2, r66508), (t2, m3, r98347),...

map: m1 ! (t1, r80521) // センサーごとに集計

// t1,t2,t3,... の順で取得できない可能性 (m1) ! [(t1, r80521), (t3, r146925), (t2, r21823)] (m2) ! [(t2, r66508), (t1, r14209), (t3, r14720)]

(2) Sorting[Secondary Sort]・“value-to-key conversion” design pattern

・Partitionerでセンサーｍが同じキーに対しては同じReducerで処理するように定義

・sencor id, timestamp の順でソートするように定義

・ただしrの値は複数のキーに分割されてしまう。用途によって使い分ける必要がある

map: (m1, t1) ! r80521

(m1, t1) ! [(r80521)] // t1,t2,t3,... の順で取得 (m1, t2) ! [(r21823)]　 (m1, t3) ! [(r146925)]　

(3) SImple Statistics[Min, Max]

class Mapper {

buffer

map(id, number) {

buffer.append(number)

if (buffer.is_full) {

max = compute_max(buffer)

emit(1, max)

}

}




[Min, Max]

class Reducer {

reduce(key, list_of_local_max) {

global_max = 0

for local_max in list_of_local_max {

if local_max > global_max {

global_max = local_max

}

}

emit(1, global_max)

}




[Min, Max]

[Mean, Var]

・Meanの場合は local_mean を求められないので local_sumをCombinerで求め、global_sumをReducerで求める

class Combiner {

combine(key, list_of_local_max) {

local_max = maximum(list_of_local_max)

emit(1, local_max)

} // Max() は可換かつ結合的




(4) Sampling [ランダムサンプリング]: 全体の1/10を一様ランダムに抽出

・結局全てのデータを読んでいる事に注意

class Mapper {

map(id, data) {

key, value = process(data)

if rand() < 0.1 { //rand() ∈ [0.0, 1.0)

emit(key, value)

}

}

}

(4) Sampling [重み付きサンプリング]

A B C D

(4) Sampling

[重み付きサンプリング]

・あるキー（例えばカテゴリ）の全体に対する割合を考慮したサンプリング

・データ全体の統計情報が必要なために1回のMapReduceでは行えない？

・効率の良いMap Reduceを検討中…

(5) Continuous Map Reduce [Continuous Map Reduce]

・Map Reduceは元々 ”batch-oriented” デザイン

・ＭapperとReducerがリアルタイムに蓄積されるデータに対してを継続して処理をし続けるモデルを考える

・Reduce処理はMap処理が全て完了するのを待たない（Map処理は処理をし続けるので）

・各Mapperは処理が終了次第、Reducerにデータを”push”し、Reducerにsortさせておく

(5) Continuous Map Reduce [Continuous Map Reduce]

・ただし、このモデルではCombine処理を行えない i.e. Sort、Reduce処理に負荷がかかる

・Reduce側の負荷が大きくなった時点でMapperはデータを送信せずにバッファリング、その際にCombine処理を行っておくのが効果的（adaptive flow control mechanism）

・論文：MapReduce Online がHadoop Online Prototype (HOP) としてContinuousモデルを提案

http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf

http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf

(5) Time Window Concept・先ほどのContinuousモデルよりも現実的なモデルを紹介

[Time Slice]

・Reducerに渡されるまでにデータを蓄積しておくインターバル（例: 1hour, 1minutes）

・MapperはSliceで定義されたインターバルのデータを蓄積、処理し、Reducerに送る

[Time Range]

・集計対象とする範囲。（例：24hour, 60minutes）

[Time Windowの種類]

(5) Time Window Concept

明確な基点がある日や月をまたぐとリセット直近24時間など、現在を基点にし続ける

Map Reduce and Stream Processing

http://horicky.blogspot.com/2010/11/map-reduce-and-stream-processing.html


(5) Incremental Processing[Incremental processing]

・Reducerは全Mapperから同じSlice（のデータ）を受け取ると処理を行い、既存の結果データとマージする

・Range Valueが更新された時点、つまりWindowが動いた時点で、古くて不要なSlide（のデータ）をアンマージする

New Range Value

Slice Value

reducer

mapper mapper mapper

Merge

UnMerge

Old Range Value

(5) Incremental Processing[例]

・24 hour sliding widow model

・Range = 24, Slide = 1

・直近24時間の”hit rate”を算出 i.e. (total hits) / 24

# Call at each hit record

map(k1, hitRecord) {

site = hitRecord.site

# 特定のkey(=site)からsliceを調べる

slice = lookupSlice(site)

if (slice.time - now > 60.minutes) {

# Notify reducer whole slice of site is sent

advance(site, slice)

slice = lookupSlice(site)

}

emitIntermediate(site, slice, 1)

}

(5) Incremental Processing




combine(site, slice, countList) {

hitCount = 0

for count in countList {

hitCount += count

}

# Send the message to the downstream node

emitIntermediate(site, slice, hitCount)

}

(5) Incremental processing




# 全てのmapperのsliceを受け取った時点で実行

reduce(site, slice, countList) {

hitCount = 0

for count in countList {

hitCount += count

}

sv = SliceValue.new

sv.hitCount = hitCount

return sv

}

(5) Incremental processing




# Windowの境界が移動した時 init(slice) {

rangeValue = RangeValue.new

rangeValue.hitCount = 0

return rangeValue

}

# Reduce処理が完了する度に merge(rangeValue, slice, sliceValue) {

rangeValue.hitCount += sliceValue.hitCount

}

# あるslice がslicing windowから外れたときに unmerge(rangeValue, slice, sliceValue) {

rangeValue.hitCount -= sliceValue.hitCount

} Map Reduce and Stream Processing



(5) In-situ Map Reduce・SynopticSystemsLab が論文で紹介：

・In-situ MapReduce for Log Processing

[コンセプト]

・In-Network Data Processing

・Lossy Map Reduce Processing

・Continuous Map Reduce

- Window Processing With Panel

!"#$%&'(#!"#)%*&+#,--*.*'&/#,0%12%/*"&#"-#31*4*&567*&4"8#955:'5%/';#"0':#<%/%#3/:'%=;#

!"#$%"&'$()*"+$,)"-.&'$/."01"#$2341-&'$5)00"6"0$7)8)+"9:0&'$7-1-.$;<$23=>-.?$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$&7:.16)#+$@1)1-$A#"*-.0"1B$ $ $ $$$$$$$$$$$$$$$$$$?CD"1E:.1D$F:66-G-$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$7:.16)#+'$HI'$A@;$ $ $$$$$$$$$$$$$$$ $$$$$$$$$$$$$$$$$$@8:>)#-'$C;'$A@;$$$$$$$$$$$$$$$$JK"#6"'$9)"-.'$1341-'$*8)8)+L$M=0<8+N<-+3$ $$$$$$$$$$$$$$$$$$$$$$$$$$$$813=>-.MED"1E:.1D<-+3$

! !!"#$%!&$!"#$%&'!()*+#*,!-+*!.+&/#$0!*,,*$1#-2!1&!%-1-3,1+*-4!.+&53*,,#$06! !7$!18#,!.-.*+9!'*!.+*,*$1!-$!-..+&-58!:&+!*/-2)-13#$0! !"#$#%&'(#%$)(! -00+*0-1*! ()*+#*,! 18-1! +*%)5*,! ;&18!,.-5*! -$%! 5&4.)1-1#&$! 1#4*! :&+! ()*+<! *=*5)1#&$6!>)+! -.3.+&-58! %#/#%*,! &/*+2-..#$0! '#$%&',! #$1&! %#,?&#$1! *+%,!9!5&4.)1*,!,);3-00+*0-1*,!&/*+!*-58!.-$*9!-$%!@+&22,!).A!18*!.-$*3-00+*0-1*,! 1&! 5&4.)1*! '#$%&'3-00+*0-1*,6! >)+! *=3.*+#4*$1-2! ,1)%<! ,8&',! 18-1! ),#$0! .-$*,! 8-,! ,#0$#:#5-$1!.*+:&+4-$5*!;*$*:#1,6!!

'(# )*+,-./0+1-*2B-$<! -..2#5-1#&$,! $**%! 1&! .+&5*,,! ,1+*-4,9! :&+! *=-4.2*9!:#$-$5#-2! %-1-! -$-2<,#,9! $*1'&+C! 1+-::#5! 4&$#1&+#$09! -$%!1*2*5&44)$#5-1#&$! 4&$#1&+#$06! D*/*+-2! %-1-;-,*! +*,*-+58!0+&).,! -+*! ;)#2%#$0! --1-! .1+*-4! /-$-0*4*$1! .<,1*4,!EFDBDG!,&!18-1!-..2#5-1#&$,!5-$!#,,)*!()*+#*,!1&!0*1!1#4*2<!#$:&+4-1#&$! :+&4! ,1+*-4,6! B-$-0#$0! -$%! .+&5*,,#$0!,1+*-4,!0#/*,!+#,*!1&!58-22*$0*,!18-1!8-/*!;**$!*=1*$,#/*2<!%#,5),,*%!-$%!+*5&0$#H*%!IJ9!K9!L9!M9!NOP6!!

Q$!#4.&+1-$1!52-,,!&:!()*+#*,!&/*+!%-1-!,1+*-4,!#,!,2#%#$03'#$%&'!-00+*0-1*!()*+#*,6!R&$,#%*+!-$!&$2#$*!-)51#&$!,<,31*4!#$!'8#58!;#%,!&$!-)51#&$!#1*4,!-+*!,1+*-4*%!#$1&!-!5*$31+-2!-)51#&$!.+&5*,,#$0!,<,1*46!S8*!,58*4-!&:!*-58!;#%!#,T!U#1*43#%9! ;#%3.+#5*9! 1#4*,1-4.V6! W&+! *-,*! &:! .+*,*$1-1#&$9!'*!-,,)4*!18-1!;#%,!-++#/*!#$!&+%*+!&$! 18*#+! 1#4*,1-4.!-131+#;)1*6! E"*! -+*! -51#/*2<! #$/*,1#0-1#$0! .+&5*,,#$0! %#,&+3%*+*%!%-1-!,1+*-4,G!X)*+<!N!,8&',!-$!*=-4.2*!&:!-!,2#%#$03'#$%&'!-00+*0-1*!()*+<6!

3/4,52'T!@W#$%!18*!4-=#4)4!;#%!.+#5*!:&+!18*!.-,1!K!4#$3)1*,!-$%!).%-1*!18*!+*,)21!*/*+<!N!4#$)1*6A!

!"#"$%&'()*+,-./0,123&4567&+,-89:;%%5&<,'28<('/&&&&&&&&&&&5;=>"&?&',@A<28&&&&&&&&&&&!#BC"&D&',@A<2E&7$! 18*! ()*+<! -;&/*9! '*! #$1+&%)5*! -!'#$%&'! ,.*5#:#5-1#&$!'#18!18+**!.-+-4*1*+,T!YQZ[\!,.*5#:#*,!18*!'#$%&'!,#H*9!D]7F\! ,.*5#:#*,! 8&'! 18*! '#$%&'! 4&/*,9! -$%! "QSSY!,.*5#:#*,! 18*! '#$%&'#$0! -11+#;)1*! &$! '8#58! 18-1! 18*!YQZ[\!-$%!D]7F\!.-+-4*1*+,! -+*!%*:#$*%6!S8*!'#$%&'!,.*5#:#5-1#&$! &:!X)*+<! N! ;+*-C,! 18*! ;#%! ,1+*-4! #$1&! &/*+32-..#$0!K34#$)1*!,);3,1+*-4,!18-1!,1-+1!*/*+<!4#$)1*9!'#18!+*,.*51! 1&! 18*! 1#4*,1-4.!-11+#;)1*6!S8*,*!&/*+2-..#$0!,);3,1+*-4,!-+*!5-22*%!!"#$#%&0(#%$)(!6!X)*+<!N!5-25)2-1*,!18*!

4-=! :&+! *-58!'#$%&'9! -$%! +*1)+$,! -! ,1+*-4!'#18! ,58*4-!U4-=9!1#4*,1-4.V9!'8*+*!18*!1#4*,1-4.!-11+#;)1*!#$%#5-1*,!18*! 1#4*!'8*$! 18*!4-=!/-2)*! #,! 0*$*+-1*%! E18*! *$%!&:! 18*!'#$%&'G6!D2#%#$0!'#$%&'!-00+*0-1*!()*+#*,!-22&'!),*+,!1&!-00+*0-1*! 18*! ,1+*-4! -1! -! ),*+3,.*5#:#*%! 0+-$)2-+#1<!EYQZ[\G!-$%!#$1*+/-2!ED]7F\G9!-$%!18),!.+&/#%*!18*!),*+,!-!:2*=#;2*!'-<!1&!4&$#1&+!,1+*-4#$0!%-1-6!!

R)++*$1!.+&.&,-2,!:&+!*/-2)-1#$0!,2#%#$03'#$%&'!-00+*0-1*!()*+#*,!;)::*+!*-58!#$.)1!1).2*!)$1#2!#1!#,!$&!2&$0*+!$**%*%!INP6! D#$5*! *-58! #$.)1! 1).2*! ;*2&$0,! 1&!4)21#.2*! '#$%&',9!,)58!-..+&-58*,!;)::*+!-!1).2*!)$1#2! #1! #,!.+&5*,,*%!:&+!18*!-00+*0-1*!&/*+! 18*! 2-,1!'#$%&'! 1&!'8#58! #1! ;*2&$0,6!\-58!#$.)1! 1).2*! #,! -55*,,*%!4)21#.2*! 1#4*,9!&$5*! :&+! *-58!'#$3%&'!18-1!#1!.-+1#5#.-1*,!#$6!!!

"*! ,**! 1'&! .+&;2*4,! '#18! ,)58! -..+&-58*,6! W#+,1! 18*!;)::*+!,#H*!+*()#+*%!#,!)$;&)$%*%T!Q1!-$<! 1#4*! #$,1-$19!-22!1).2*,! 5&$1-#$*%! #$! 18*! 5)++*$1! '#$%&'! -+*! #$! 18*! ;)::*+9!-$%!,&!18*!,#H*!&:!18*!+*()#+*%!;)::*+,!#,!%*1*+4#$*%!;<!18*!'#$%&'!+-$0*!-$%!18*!%-1-!-++#/-2!+-1*6!D*5&$%9!.+&5*,,#$0!*-58!#$.)1!1).2*!4)21#.2*!1#4*,!2*-%,!1&!-!8#08!5&4.)1-1#&$!5&,16!W&+!*=-4.2*!#$!X)*+<!N9!*-58!#$.)1!1).2*!#,!.+&5*,,*%!:&)+!1#4*,6!Q,!18*!+-1#&!&:!YQZ[\!&/*+!D]7F\!#$5+*-,*,9!,&!%&*,!18*!$)4;*+!&:!1#4*,!*-58!1).2*!#,!.+&5*,,*%6!R&$3,#%*+#$0!18*!2-+0*!/&2)4*!-$%!:-,1!-++#/-2!+-1*!&:!,1+*-4#$0!%-1-9!+*%)5#$0!18*!-4&)$1!&:!+*()#+*%!;)::*+!,.-5*!E#%*-22<!1&!-!5&$,1-$1!;&)$%G!-$%!5&4.)1-1#&$!1#4*!#,!-$!#4.&+1-$1!

617/,42'8291*.-:;2&-<=-;4.2->26-/,2?@*4;2

SIGMOD Record, Vol. 34, No. 1, March 2005 39

No Pane, No Gain: Efficient Evaluation of Sliding-WindowAggregates over Data Streams

http://cseweb.ucsd.edu/~kyocum/synsys/mortar-wide-scale-stream-pr.html

http://cseweb.ucsd.edu/~kyocum/synsys/mortar-wide-scale-stream-pr.html

http://www.cs.ucsd.edu/~kyocum/pubs/USENIX-2011-CR.pdf

http://www.cs.ucsd.edu/~kyocum/pubs/USENIX-2011-CR.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.5207&rep=rep1&type=pdf




(6) Join[Join]

・General reducer-side join

・Optimized reducer-side join

・Map-side partition join

・Map-side partition merge join

・Memcache join

詳細は次回以降

5. 応用編について

応用編について[次回予告]: 以下の内容の中でいくつか…

・Map Reduce for 機械学習、グラフ、行列演算

・実装例を紹介

・場合によってはMap Reduce が非効率なことを示し、BSPなどの他の計算モデルを紹介、比較

・In-situ Map Reduce の詳しい紹介

・Dryad の紹介

Map Reduce for k-means

K-Means Clustering in Map Reduce

http://horicky.blogspot.com/2011/04/k-means-clustering-in-map-reduce.html

http://horicky.blogspot.com/2011/04/k-means-clustering-in-map-reduce.html

Map Reduce for Classifier

A Comparison of Approaches for Large-Scale Data Mining

Figure 2: MapReduce Classifier Training and Evaluation Procedure

Table 1: Numbers of HTML Pages from Eight Categories.

Category Art Business Computer Game Health Home Science Society

Number 176, 340 188, 100 88, 830 39, 560 43, 680 22, 281 85, 197 81, 620

Table 2: Data mining accuracy of Hadoop approach.

Category Round1 Round2 Round3 Round4 Round5 Round6 Round7 Round8 Round9 Round10

Art 80.09% 80.12% 81.18% 80.31% 80.20% 80.17% 80.86% 79.70% 79.79% 80.06%Business 55.82% 58.43% 53.77% 57.30% 57.13% 59.62% 54.98% 58.36% 59.15% 57.98%Computer 82.37% 81.93% 82.88% 82.68% 82.48% 82.18% 82.81% 82.22% 81.69% 82.14%Game 78.83% 78.84% 76.86% 77.70% 78.45% 78.94% 78.17% 79.78% 78.49% 79.02%Health 78.77% 80.49% 79.68% 80.42% 79.95% 80.33% 79.76% 79.71% 80.14% 81.27%Home 68.01% 66.78% 67.97% 66.41% 67.16% 67.12% 67.49% 67.13% 66.16% 67.44%Science 48.47% 50.64% 49.98% 50.19% 49.26% 47.89% 48.34% 49.32% 49.62% 48.53%Society 63.07% 61.00% 61.10% 61.62% 61.77% 61.50% 61.92% 62.16% 62.16% 61.39%

http://www.utdallas.edu/~bxt043000/Publications/Technical-Reports/UTDCS-24-10.pdf

http://www.utdallas.edu/~bxt043000/Publications/Technical-Reports/UTDCS-24-10.pdf

BSP For Graph Processing

Google Pregel Graph Processing

http://horicky.blogspot.com/2010/07/google-pregel-graph-processing.html


Dryad

Google Pregel Graph Processing



ありがとうございました

Map Reduce 〜入門編：仕組みの理解とアルゴリズムデザイン〜

Technology

Transcript of Map Reduce 〜入門編：仕組みの理解とアルゴリズムデザイン〜