Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

32
Google and Cloud Computing Google 与与与与 王王王 Google 与与与与与

Transcript of Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Page 1: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Google and Cloud ComputingGoogle与云计算

王咏刚

Google 资深工程师

Page 2: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Agenda

• The Internet: From Hardware to Community

• The Innovation: A Computing Cloud

• Breakthroughs for Cloud Computing

• Google Apps for Cloud Computing

• Google Infrastructure for Cloud Computing

Page 3: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

The InternetFrom Hardware to Community

Page 4: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

The Internet: From Hardware to Community

MySpace

Facebook

开心网校内网……

Page 5: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

What Do Today’s Users Want?

• Accessibility– Access from anywhere and from multiple devices

• Shareability– Make sharing as easy as creating and saving

• Freedom– Users don’t want their data held hostage

• Simplicity– Easy-to-learn, easy-to-use

• Security– Trust that data will not be lost or seen by unwanted parties

Page 6: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

6

The InnovationA Computing Cloud

Page 7: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Cloud Computing

7

Page 8: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Attributes of Cloud Computing

8

• Data stored on the cloud• Software & services on the cloud - Access via web browser

• Based on standards and protocols - Linux, AJAX, LAMP, etc.

• Accessible from any device

Hardware Centric Software Centric Service Centric

Personal PC Client Server Cloud Computing

Page 9: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

9

Breakthroughs for Cloud Computing

Page 10: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Breakthroughs for Cloud Computing

10

User-Centric1

Task-Centric2

Powerful3

Intelligent4

Affordable5

Programmable6

Page 11: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

User Centric

Data stored in the “Cloud”

Data follows you & your devices

Data accessible anywhere

Data can be shared with others

music

preferences

maps

newscontacts

messages

mailing lists

photo

e-mails

calendar

phone numbers

investments

Page 12: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Example : GMail

– Just a web browser and your account with password!– Once you login, the device is “yours”.– Data stored on remote servers in the “cloud” (with large capacity)

Beijing, on travel

San Francisco, Monday

Home, Wednesday

Page 13: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Use Google Docs to Solve a Task

Access your docs from anywhere

Chat with others in real time

Changes instantly appear to other collaborators

Task = “Teachers creating a departmental curriculum”

Page 14: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Communication Task – Email, Chat, Contacts, Chat History

Page 15: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Task: Collaborate on Spreadsheet – Communicate

Chat with others editing

the spreadsheet

Page 16: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Task: Collaborate on Spreadsheet – Collaborate

Invite others to collaborate on

the spreadsheet

Page 17: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Task: Collaborate on Spreadsheet – Publish

Invite others to view the

spreadsheet

Page 18: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

You can also easily organize all your common tasks

Page 19: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Cloud Computing is Powerful: It can do what no PC can do

Is Google Search faster than search in Windows/Outlook/Word?

• And Google Search must be much harder….

How much storage does it take to store all of the web pages?

• 100B pages * 10K per page = 1000T disk!

Cloud computing has at its disposal

• Essentially infinite amount of disk

• Essentially infinite amount of computation

• (Assuming they can be parallelized)

Example: Google Search

Page 20: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Web Page Search Universal Search

W

1st Generation: era of single search – not diverse2nd Generation: era of vertical search – too complex

3rd Generation: an era of Universal Search

A

B

C

D

E

Page 21: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

From vertical search to universal search

A

B

C

D

E

Integration of user experience

Page 22: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Universal Search Example

Page 23: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Universal Search Example

Page 24: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Cloud Computing Infrastructure

Page 25: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

25

GFS Architecture

Google48%

MSN19%

Yahoo33%

• Files broken into chunks (typically 64 MB)• Master manages metadata• Data transfers happen directly between clients/chunkservers

Client

ClientClientRep

licas

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver 1

C0

C2

C5

Chunkserver N

C1

C3C5

Chunkserver 2

ClientClient

ClientClient

ClientClient

Page 26: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Typical Cluster

26

Scheduling masters

GFSchunkserver

Schedulerslave

Linux

Machine 1

User app2

Userapp1

GFS masterLock service

GFSchunkserver

Schedulerslave

Linux

Machine N

Userapp3

User app2

Userapp1

GFSchunkserver

Schedulerslave

Linux

Machine 2

Userapp3

Page 27: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

MapReduce

27

Page 28: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

More specifically…

28

• Programmer specifies two primary methods:– map(k, v) → <k', v'>*

– reduce(k', <v'>*) → <k', v'>*

• All v' with same k' are reduced together, in order.

• Usually also specify:– partition(k’, total partitions) -> partition for k’

• often a simple hash of the key

• allows reduce operations for different k’ to be parallelized

Page 29: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

29

BigTable

• Distributed multi-level map– With an interesting data model

• Fault-tolerant, persistent

• Scalable– Thousands of servers

– Terabytes of in-memory data

– Petabyte of disk-based data

– Millions of reads/writes per second, efficient scans

• Self-managing– Servers can be added/removed dynamically

– Servers adjust to load imbalance

Page 30: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

30

BigTable: Basic Data Model

• Distributed multi-dimensional sparse map

(row, column, timestamp) cell contents

• Good match for most of our applications

……

“<html>…”

t1t2

t3www.cnn.com

ROWS

COLUMNS

TIMESTAMPS

“contents”

Page 31: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

BigTable: System Architecture

Cluster Scheduling Master

handles failover, monitoring

GFS

holds tablet data, logs

Lock service

holds metadata,handles master-election

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable master

performs metadata ops,load balancing

Bigtable cellBigtable clientBigtable client

library

Open()

Page 32: Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Thanks

Q&A