Mi primer map reduce

Post on 19-May-2015

263 views 2 download

description

Charla sobre big data y map reduce por Rubén Orta.

Transcript of Mi primer map reduce

Mi Primer Map/Reduce

Rubén Orta @agileando

1

2

3

4

historia

implementación

netflix prize en python

enlaces

Big Data = Contar

1

CONTAR

1

JeffDean

SanjayGhemawat

1

map (key , value)new_value = a_function(value)return new_key, new_value

reduce (key, value)new_value = another_function(value)return key, new_value

2

f() f() f() f() f()

f’() f’() f’() f’() f’()

Dataset:Millones de páginas web

Mapfor each word in document: return (word, 1);

Reducetotal = 0for each item in value: total++return (key, total);

2

2

3

import mincemeat

data = dict((f, read_data(f)) for f in data_files)

s = mincemeat.Server()s.datasource = datas.mapfn = mapfns.reducefn = reducefn

results = s.run_server (password = "ruben")

3

def mapfn(key, value): lines = value.splitlines() film_id = lines[0][:-1] for line in lines[1:]: items = line.split(",") user_id = items[0] rating = items[1] date = items[2] yield user_id, film_id

3

def reducefn(key, values):

number_of_films = 0 for value in values: number_of_films += 1 return number_of_films

3

Papers

GFS http://research.google.com/archive/gfs.htmlMapReduce http://research.google.com/archive/mapreduce.htmlBigTable http://research.google.com/archive/bigtable.html

Dynamo http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

Dremel http://research.google.com/pubs/pub36632.htmlSpanner http://research.google.com/archive/spanner.html

PythonMinceMeat.py https://github.com/michaelfairley/mincemeatpyOcto.py http://code.google.com/p/octopy/Netflix DataSet http://www.lifecrunch.biz/archives/207

4

Rubén Orta

http://www.slideshare.net/agileando/mi-primer-map-reduce

Blog http://devspoke.com/Twitter https://twitter.com/agileandoGitHub https://github.com/rubenorta

4

BUSCAMOS GENTE PARA NUESTRO

EQUIPO¿Quieres unirte?

*unix, scripting (python, perl)devops