Putting hadoop on any cloud big data spain

Post on 15-Jan-2015

973 views 0 download

description

The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. Now more than ever, there is a growing number of choices of cloud infrastructure providers, from Amazon AWS, OpenStack offered by the likes of HP, Rackspace and soon even Dell, VMware vCloud as well a... INCLUDING - Effectively managing your Hadoop stack in any data center (on-premise, cloud, hybrid…) - Maintaining the flexibility to choose the right cloud for the job in an ever-changing environment - Consistently manage your hadoop deployment with other elements of your Big Data system such as NoSQL DB, Web Tier etc.

Transcript of Putting hadoop on any cloud big data spain

The Elephant

in the Cloud

Putting Hadoop on Any Cloud

@natishalom

Columbus & The Cloud

THE DISCOVERY OF AMERICA THE THING THAT MADE IT POSSIBLE

Why Cloud Portability

Matters

Cloud Portability Myth #1

No one really needs cloud portability

Cloud Portability

Facts

Zynga moved ~80% of their workload from Amazon to their private zCloud

“own the base, rent the spike”

http://code.zynga.com/2012/02/the-evolution-of-zcloud/

Cloud Portability

Facts Started with Linode, then moved to RackSpace, then to AWS

http://code.mixpanel.com/2010/11/08/amazon-vs-rackspace/

Cloud Portability

Facts

• You want the flexibility to choose what’s right for you, when it’s right for you

• Based on pricing, features, availability, performance, etc.

Cloud Portability Myth #2

Cloud Portability ==

Cloud API Standardization

Cloud APIs, Today

Standard APIs (?)OCCIVCloud

OSS FrameworksOpenStackCloudStackEucalyptus

Abstraction frameworksJCloudsDeltacloudFogLibvirt

Cloud APIs, Today

Standard APIsNot practical in the foreseeable future

OSS Projects Need a couple more years to converge &

mature

Abstraction FrameworksProbably the only

practical (near-term) option

Realization:

What You Really Care

about Is App

Portability

OS is the same on any cloud

Most clouds have compute & storage

Elasticity & scaling have same effects on the app, regardless of the cloud

Cloud Portability Myth #3 All infrastructure

clouds were born equal

Food for Thought

Offerings can vary quite a bit:

• Amazon guarantees only 99.5% uptime

• RackSpace will give you $$$ every time they crash

• Joyent claims to be significantly faster than both

And Some Features Are

Unique…

Amazon the only major vendor to offer SSD storage. Netflix says it’s:

• ½ the price for the same throughput

• ⅕ the latency on avg.

• Even slowest requests are 6x faster

http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

Let’s Talk Big Data on the Cloud

A Typical Big Data App…

Managing Big Data on the

Cloud

• Auto start VMs• Install and configure

app components • Monitor • Repair • (Auto) Scale• Burst…

The Challenges ..

Consistent Management

Making the deployment, installation, scaling, fail-over looks the same through the entire stack

The Challenges (Cont)..

Cloud Portability

Choosing the Right Cloud for the Job

Running Bare-Metal for high I/O workload, Public cloud for sporadic workloads..

Hadoop

• Available under different distributions

• Cloudera• IBM BigInsights• MapR• Hortonworks

Big Data Apps, on Any Cloud, Your Way

Open source (Apache2)

Putting Cloudify and

Hadoop Together

• Run on Any Cloud• Consistent MGT• Dynamic Scaling • Auto Recovery• Auto Scaling• Role Assignments • Monitoring• Simple maintenance

How it works..1 Upload your recipe.

2 Cloudify creates VM’s & installs agents

3 Agents install and manage your app

4 Cloudify automate the scaling

Few Snippets..