Hadoop & Neptune Feb. 2009 김형준.

20
Hadoop & Neptune Feb. 2009 http://www.openneptune.com http://www.jaso.co.kr 김김김

description

More CPU Faster Disk Program Tuning More Memory

Transcript of Hadoop & Neptune Feb. 2009 김형준.

Page 1: Hadoop & Neptune Feb. 2009   김형준.

Hadoop & Nep-tune

Feb. 2009http://www.openneptune.com

http://www.jaso.co.kr

김형준

Page 2: Hadoop & Neptune Feb. 2009   김형준.

The Data 'Tsunami'

Page 3: Hadoop & Neptune Feb. 2009   김형준.

More CPU

Faster DiskProgram Tuning

More Memory

Page 4: Hadoop & Neptune Feb. 2009   김형준.

Uninstall

Page 5: Hadoop & Neptune Feb. 2009   김형준.

Where?Distributed File System

How?Distributed/Parallel Computing

Page 6: Hadoop & Neptune Feb. 2009   김형준.

Hadoop DFSUnlimited StorageNo Backup, Self-healingThousands NodesBut, No POSIXNo Random write

Page 7: Hadoop & Neptune Feb. 2009   김형준.

: machine: daemon process

NameNode(DFS Master)

JobTracker(Job Master)

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

SecondaryNameNode

ClientAPIcontrol

datacontrol

data

Page 8: Hadoop & Neptune Feb. 2009   김형준.

Hadoop MapReduce1TB group by -> 10 분

More Machine -> 1 분

Page 9: Hadoop & Neptune Feb. 2009   김형준.

• map (k1,v1) → list(k2,v2)• reduce (k2, list (v2)) → result value

This is a book. That book is on the desk.I like that book.

This is a book. That book is on the desk.

I like that book.

(This,1)(book, 1)(That, 1)(book, 1)…

(I,1)(that, 1)(book, 1)…

map()

map()

(book, [1,1,1])…(is, [1,1])…(This,[1])

(book, 3)…(is, 2)…(This,1)

reduce()

Exec distributed/parallelMap&Reduce execution platform

Split

PartitionMergeSort

Page 10: Hadoop & Neptune Feb. 2009   김형준.

: machine: daemon process

NameNode(DFS Master)

JobTracker(Job Master)

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

SecondaryNameNode

ClientAPIcontrol

datacontrol

data

Page 11: Hadoop & Neptune Feb. 2009   김형준.

A piece of Cake

Page 12: Hadoop & Neptune Feb. 2009   김형준.

NeptuneDatabase running on DFS(Hadoop)Unlimited Structured DataNo Backup

But, No JOIN, No SQLNo Multiple row operationNo Aggregation function

Page 13: Hadoop & Neptune Feb. 2009   김형준.

OperationCreate/Drop Tableput/getlike/betweenscan/merge scan(join)MapReduce

Page 14: Hadoop & Neptune Feb. 2009   김형준.

Why Neptune?

Tablet A-3

Tablet A-N

Tablet A-2

TabletA-1

TableA

JobTracker

Make Map&Reduce function

Run on Map&Reduce framework

META Table Get tablet list

Map Task

TaskTracker

Map TaskMap Task

Map Task

TaskTracker

Map TaskMap Task

Map Task

TaskTracker

Map TaskMap Task

Task assign to each node

TaskTracker

ReduceTask

TaskTracker

ReduceTask

TableB

Tablet B-2

Tablet B-1

분산 / 병렬처리: Speed, Scalability

Page 15: Hadoop & Neptune Feb. 2009   김형준.

분산파일시스템 (Hadoop or other)

TabletServer #1TabletServer #2 TabletServer #n

Cluster Management System

NeptuneMaster

분산 / 병렬컴퓨팅 플랫폼(Hadoop)

사용자 애플리케이션

Neptune( 대용량분산 데이터 저장소 )

논리적 Table

물리적 저장소

Page 16: Hadoop & Neptune Feb. 2009   김형준.

When use NeptuneLarge DataOnline put/get and analysisLess complex

Google Personalized SearchGoogle analytics

Page 17: Hadoop & Neptune Feb. 2009   김형준.

Finding developer

Page 18: Hadoop & Neptune Feb. 2009   김형준.

Cheap Hardware and Smart SoftwareUse cheap commodity hardware frequent failureDevelop smart software for reducing the cost of failure

Easy ManagementHigh Scalability by automatic discovery of new servers and racksHigh Redundancy for failure of servers, racks, even data centers

Speed and Then More SpeedHigh speed with low cost Rapid development and deployment of new products

Use existing technologiesUse techniques from the leading edge of computer scienceUse open source codes as a starting point

Principle of Google Infra

Page 19: Hadoop & Neptune Feb. 2009   김형준.

Google Infra

Google Linux

GFS

Bigtable

Map & Reduce Client API

Chubby

Cluster M

gmt

Batch applica-tion Online Services

HardwareLow-end commodity servers40 or more pizza box server per rack

Google’s core competencyGoogle’s software stack

Page 20: Hadoop & Neptune Feb. 2009   김형준.

Q&A