Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Post on 11-Jan-2016

294 views 2 download

Transcript of Google and Cloud Computing Google 与云计算 王咏刚 Google 资深工程师.

Google and Cloud ComputingGoogle与云计算

王咏刚

Google 资深工程师

Agenda

• The Internet: From Hardware to Community

• The Innovation: A Computing Cloud

• Breakthroughs for Cloud Computing

• Google Apps for Cloud Computing

• Google Infrastructure for Cloud Computing

The InternetFrom Hardware to Community

The Internet: From Hardware to Community

MySpace

Facebook

开心网校内网……

What Do Today’s Users Want?

• Accessibility– Access from anywhere and from multiple devices

• Shareability– Make sharing as easy as creating and saving

• Freedom– Users don’t want their data held hostage

• Simplicity– Easy-to-learn, easy-to-use

• Security– Trust that data will not be lost or seen by unwanted parties

6

The InnovationA Computing Cloud

Cloud Computing

7

Attributes of Cloud Computing

8

• Data stored on the cloud• Software & services on the cloud - Access via web browser

• Based on standards and protocols - Linux, AJAX, LAMP, etc.

• Accessible from any device

Hardware Centric Software Centric Service Centric

Personal PC Client Server Cloud Computing

9

Breakthroughs for Cloud Computing

Breakthroughs for Cloud Computing

10

User-Centric1

Task-Centric2

Powerful3

Intelligent4

Affordable5

Programmable6

User Centric

Data stored in the “Cloud”

Data follows you & your devices

Data accessible anywhere

Data can be shared with others

music

preferences

maps

newscontacts

messages

mailing lists

photo

e-mails

calendar

phone numbers

investments

Example : GMail

– Just a web browser and your account with password!– Once you login, the device is “yours”.– Data stored on remote servers in the “cloud” (with large capacity)

Beijing, on travel

San Francisco, Monday

Home, Wednesday

Use Google Docs to Solve a Task

Access your docs from anywhere

Chat with others in real time

Changes instantly appear to other collaborators

Task = “Teachers creating a departmental curriculum”

Communication Task – Email, Chat, Contacts, Chat History

Task: Collaborate on Spreadsheet – Communicate

Chat with others editing

the spreadsheet

Task: Collaborate on Spreadsheet – Collaborate

Invite others to collaborate on

the spreadsheet

Task: Collaborate on Spreadsheet – Publish

Invite others to view the

spreadsheet

You can also easily organize all your common tasks

Cloud Computing is Powerful: It can do what no PC can do

Is Google Search faster than search in Windows/Outlook/Word?

• And Google Search must be much harder….

How much storage does it take to store all of the web pages?

• 100B pages * 10K per page = 1000T disk!

Cloud computing has at its disposal

• Essentially infinite amount of disk

• Essentially infinite amount of computation

• (Assuming they can be parallelized)

Example: Google Search

Web Page Search Universal Search

W

1st Generation: era of single search – not diverse2nd Generation: era of vertical search – too complex

3rd Generation: an era of Universal Search

A

B

C

D

E

From vertical search to universal search

A

B

C

D

E

Integration of user experience

Universal Search Example

Universal Search Example

Cloud Computing Infrastructure

25

GFS Architecture

Google48%

MSN19%

Yahoo33%

• Files broken into chunks (typically 64 MB)• Master manages metadata• Data transfers happen directly between clients/chunkservers

Client

ClientClientRep

licas

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver 1

C0

C2

C5

Chunkserver N

C1

C3C5

Chunkserver 2

ClientClient

ClientClient

ClientClient

Typical Cluster

26

Scheduling masters

GFSchunkserver

Schedulerslave

Linux

Machine 1

User app2

Userapp1

GFS masterLock service

GFSchunkserver

Schedulerslave

Linux

Machine N

Userapp3

User app2

Userapp1

GFSchunkserver

Schedulerslave

Linux

Machine 2

Userapp3

MapReduce

27

More specifically…

28

• Programmer specifies two primary methods:– map(k, v) → <k', v'>*

– reduce(k', <v'>*) → <k', v'>*

• All v' with same k' are reduced together, in order.

• Usually also specify:– partition(k’, total partitions) -> partition for k’

• often a simple hash of the key

• allows reduce operations for different k’ to be parallelized

29

BigTable

• Distributed multi-level map– With an interesting data model

• Fault-tolerant, persistent

• Scalable– Thousands of servers

– Terabytes of in-memory data

– Petabyte of disk-based data

– Millions of reads/writes per second, efficient scans

• Self-managing– Servers can be added/removed dynamically

– Servers adjust to load imbalance

30

BigTable: Basic Data Model

• Distributed multi-dimensional sparse map

(row, column, timestamp) cell contents

• Good match for most of our applications

……

“<html>…”

t1t2

t3www.cnn.com

ROWS

COLUMNS

TIMESTAMPS

“contents”

BigTable: System Architecture

Cluster Scheduling Master

handles failover, monitoring

GFS

holds tablet data, logs

Lock service

holds metadata,handles master-election

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable master

performs metadata ops,load balancing

Bigtable cellBigtable clientBigtable client

library

Open()

Thanks

Q&A