A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

30
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A Comprehensive Approach to Building Your Big Data Solution We do Hadoop.

Transcript of A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 1: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

A Comprehensive Approach to Building Your Big Data Solution

We do Hadoop.

Page 2: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Speakers

   Hortonworks ◦  Ali Bajwa, Senior Partner Solution Engineer

   Red Hat ◦  Irshad Raihan, Senior Principal, Product Marketing

   Cisco ◦  Ron Graham, Big Data Analytics Engineer

Page 3: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Partnership

100%  open  source  Hadoop  Distribu5on,    Support  and  Training    

Middleware,  Storage,  PaaS,  IaaS  

UCS  Integrated  Infrastructure  For  Big  Data  

CISCO,  HORTONWORKS  AND  RED  HAT  ARE  PARTNERING  TO  HELP  YOU  BUILD  YOUR  BIG  DATA  SOLUTION  AND  REACH  MASSIVE  SCALABILITY,  SUPERIOR  EFFICIENCY  AND  DRAMATICALLY  LOWER  TOTAL  COST  OF  

OWNERSHIP  THANKS  TO  A  VALIDATED  JOINT  ARCHITECTURE.

Page 4: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional systems under pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 5: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Modern Data Architecture emerges to unify data & processing

Modern Data Architecture •  Enable applications to have access to

all your enterprise data through an efficient centralized platform

•  Supported with a centralized approach governance, security and operations

•  Versatile to handle any applications and datasets no matter the size or type

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

SOU

RC

ES

Existing Systems

ERP   CRM   SCM  

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization & Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-Time Batch Partner ISV Batch Batch MPP  

EDW  

Page 6: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Driver: Cost optimization

Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer

Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL

Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

HDP helps you reduce costs and optimize the value associated with your EDW

AN

ALY

TIC

S D

ATA

SYST

EMS

Data Marts

Business Analytics

Visualization & Dashboards

HDP 2.2

ELT °

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Cold Data, Deeper Archive & New Sources

Enterprise Data

Warehouse

Hot

MPP

In-Memory

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Existing Systems

ERP   CRM   SCM  

SOU

RC

ES

Page 7: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Driver: Enabling the data lake SC

ALE

SCOPE

Data Lake Definition •  Centralized Architecture

Multiple applications on a shared data set with consistent levels of service

•  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.

•  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.

Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps

Goal: •  Centralized Architecture •  Data-driven Business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

Page 8: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Only HDP delivers a Centralized Architecture HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource Management, consistent Governance & Security and efficient Operations services across Hadoop applications.

Hortonworks Data Platform

YARN Data Operating System

•  A centralized architecture of consistent enterprise services for resource management, security, operations, and governance.

•  The versatility to support multiple applications and diverse workloads from batch to interactive to real-time, open source and commercial.

Key Benefits

•  Multiple applications on a shared data set with consistent levels of service: a multitenant data platform.

•  Provides a shared platform to enable new analytic applications.

•  Delivers maximum cost efficiency for cluster resource management. Fewer servers fewer nodes.

Storage

YARN: Data Operating System

Governance Security

Operations

Resource Management

Existing Applications

New Analytics

Partner Applications

Data Access: Batch, Interactive & Real-time

Page 9: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP delivers a completely open data platform

Hortonworks Data Platform 2.2

Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.

Completely Open

•  HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations

•  All components are developed in open source and then rigorously tested, certified, and delivered as an integrated open source platform that’s easy to consume and use by the enterprise and ecosystem.

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Apa

che

Pig

° °

° °

° ° °

° ° °

HDFS (Hadoop Distributed File System)

GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

Apache Falcon

Apa

che

Hiv

e C

asca

ding

A

pach

e H

Bas

e A

pach

e A

ccum

ulo

Apa

che

Sol

r A

pach

e S

park

Apa

che

Sto

rm

Apache Sqoop

Apache Flume

Apache Kafka

SECURITY

Apache Ranger

Apache Knox

Apache Falcon

OPERATIONS

Apache Ambari

Apache Zookeeper

Apache Oozie

Page 10: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP: Any Data, Any Application, Anywhere

Any Application •  Deep integration with ecosystem

partners to extend existing investments and skills

•  Broadest set of applications through the stable of YARN-Ready applications

Any Data Deploy applications fueled by clickstream, sensor, social, mobile, geo-location, server log, and other new paradigm datasets with existing legacy datasets.

Anywhere Implement HDP naturally across the complete range of deployment options

Clickstream   Web    &  Social  

Geoloca3on   Internet  of  Things  

Server    Logs  

Files,  emails  ERP   CRM   SCM  

hybrid

commodity appliance cloud

Over 70 Hortonworks Certified YARN Apps

Page 11: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Open Source IS the standard for platform technology Modern platform standards are defined by open communities

For Hadoop, the ASF provides guidelines and a governance framework and the open community defines the standards for Hadoop.

Roadmap matches user requirements not vendor monetization requirements

Hortonworks Open Source Development Model yields unmatched efficiency •  Infinite number of developers under governance of ASF applied to problem

•  End users motivated to contribute to Apache Hadoop as they are consumers •  IT vendors motivated to align with Apache Hadoop to capture adjacent opportunities

Hortonworks Open Source Business Model de-risks investments •  Buying behavior changed: enterprise wants support subscription license

•  Vendor needs to earn your business, every year is an election year •  Equitable balance of power between vendor and consumer

•  IT vendors want platform technologies to be open source to avoid lock-in

Page 12: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

TITLE SLIDE: HEADLINE

Presenter name Title, Red Hat Date

Red  Hat  Big  Data  Open  the  possibili5es  of  your  data  

Page 13: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

13

Big  Data  innova3on  cannot  happen  in  a  bubble  Strong  partnerships  with  industry  leaders  and  open  source  communi5es  

Page 14: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

14

Business  User  Architect  Data  Center  Operator   App  Developer  

Mul5ple  Silos.  Mul5ple  Views.  Mul5ple  Goals.  

The  Old  Data  Lifecycle  

Manage     Build     Code   Query  

Page 15: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

15

Business  User  

Architect  Data  Center  Operator  

App  Developer  

One  Language.  One  View.  One  Goal.  

The  New  Data  Lifecycle  

Ingest   Integrate  

Act   Discover  

Page 16: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

16

Lack  of  agile,  open,  and  cost  effec5ve  enterprise-­‐grade  solu5ons  

Barriers  to  Big  Data  Success  

I  want  more  than  canned  BI  queries  

I  am  locked  into  a  vendor  stack  

I  want  to  use  my  favorite  dev  framework  

I  need  to  integrate  data  across  silos  

Business  User  

Architect  Data  Center  Operator  

App  Developer  

Page 17: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

17

Business  User  

Architect  

Data  Center  Operator  

App  Developer  

Ingest  

Integrate  

Act  

Discover  

Big  Data  Solu3ons  from  Red  Hat  

Page 18: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Integrated  Big  Data  PlaOorm    

Cisco UCS Integrated Infrastructure for Big Data

Hadoop Compatible File System

Red Hat Storage

Hadoop Data Processing Map/Reduce YARN

Analytics

Operating System Red Hat Enterprise Linux

Cloud Red Hat Enterprise Linux

OpenStack Platform

Operating Environment Data Integration & Application Development

Application Platform- as-a-Service

OpenShift by Red Hat

Data Integration and Data Services

Red Hat JBoss Data Virtualization

Data Caching Red Hat JBoss

Data Grid

Business Rules Mgmt Red Hat JBoss BRMS

Development Red Hat JBoss

Developer Studio Hadoop

Distributed File

System

Man

agem

ent

Hortonworks Cisco Red Hat

Data Integration and Data Services

Composite

Cloud Cisco OpenStack

Pig Spark Storm HBase Tez Hive

Cisco Security Suite C

isco

UC

S D

irect

ory

Exp

ress

C

isco

Uni

fied

Man

agem

ent

Am

bari

Virtualization Red Hat Enterprise

Virtualization

Page 19: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Software and Solutions Innovation Empowering What’s Next

Ron Graham Big Data Analytics Engineer

Hardware Architecture Cisco UCS with Big Data

Page 20: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

20 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

Why Cisco UCS for Big Data?

•  Manageability •  Save time with UCS Manager

•  Enables consistent and rapid deployments using UCS Service profiles

•  Offers operational simplification •  Delivers a modular solution

•  Scalability •  Performance

SIM Card Identity for a phone

Service Profile Identity for a server

Page 21: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

21 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

•  End to end provisioning, installation, and monitoring tool for Hadoop Clusters

•  Better business outcomes with faster time to value from Big Data

•  Provides appliance like experience with out inflexibilities

•  Centralized visibility across Hadoop and physical infrastructure

•  Powerful interface for further integration into third party tools and services

UCS Director Express for Big Data End to end solution for Hadoop

Page 22: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

22 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next Powering Big Data and Analytics

UCS  B200  Scale-­‐out  Analy5cs  

Big  Data  with  EMC  Isilon  

and  VCE  

Invicta  (Fast  Data)  

UCS  C240  (Hadoop,  NoSQL  

MPP)  

UCS  Manager,  Director,  Express,  Central,  Redhat    

ACl  

C/B460  (In-­‐memory  Analy5cs)  

UCS  C3160,  C3260  

(Hadoop)  

UCS  C220  (real-­‐5me,  streaming)  

FlexPod  Select  with  

NetApp  E-­‐Series  UCS  Mini  (All-­‐in-­‐one  

at  Edge)  UCS  M-­‐Series  (Massive  

scale-­‐out)  

Ac5an,  DataStax,  Hortonworks,  MongoDB,  Pivotal,SAP,  SAS,  Splunk    

Cisco,  Elas5c  Search,  IBM,  Informa5ca,  MicrosoZ,  MicroStrategy  ,  Oracle,  SAP,  SAS    and  others  

Complete  and  Industry  

leading  Por[olio  

Ecosystem  Partners  

ISV  Partners  

Infrastructure  Management  

Data  Management  

Applica5ons  

Page 23: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

23 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

DESIGNS Big Data

Cisco Validated Designs for leading big data platforms can be found at: www.cisco.com/go/bigdata

Cisco Validated Designs Accelerate Deployment

Page 24: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

24 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

Server 8x UCS C220 M4 CPU 2 x Intel Xeon

E5-2620 v3 (15M Cache, 2.40 GHz)

Memory 256GB Storage 8 1.2-TB 10K SAS

SFF HDD

Starter High Performance

Server 8x UCS C220 M4 CPU 2 x Intel Xeon

E5-2680 v3 (30M Cache, 2.50 GHz)

Memory 384GB Storage 2 1.2-TB 10K SAS

SFF HDD, 6 400-GB SAS SSD

Performance Optimized Capacity Optimized Extreme Capacity

Server 16x UCS C240 M4 CPU 2 x Intel Xeon E5-2680

v3 (30M Cache, 2.50 GHz)

Memory 256GB Storage 2 120-GB SATA SSD,

24 1.2-TB 10K SAS SFF HDD

Server 16x UCS C240 M4 CPU 2 x Intel Xeon

E5-2620 v3 (15M Cache, 2.40 GHz)

Memory 128GB Storage 2 120-GB SATA

SSD. 12 4-TB 7.2K SAS SFF HDD

Server 2x UCS C3160 CPU 2 x Intel Xeon

E5-2695 v2 (30M Cache, 2.40 GHz)

Memory 256GB Storage 2 120-GB SATA

SSD, 60 4-TB 7.2K SAS SFF HDD

Cisco UCS CPA for Big Data v3 Reference Architecture and Bundles

Page 25: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

25 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

2x UCS 6296 Series Fabric Interconnect

UCS Manager

•  UCS Domain (68 Servers) •  Manage by UCS Manager •  2.8 PB of storage

•  HDP 2.2 •  Tiered Storage •  Tez

•  RHEL 6.5

•  Dual 10G Network

•  17 Servers Per Rack

UCS C240 M4 2x E5-2680 v3

256GB Memory Cisco 12Gb/s SAS Raid Controller

2x 120GB STAT SSD 24x 1.2TB 10k SAS

2x Cisco UCS VIC 1227

UCS C3160 2x E5-2695 v2

256GB Memory Cisco 12Gb/s SAS Raid Controller

2x 120GB SATA SSD 60x 4TB 7.2k SAS SFF

2x Cisco UCS VIC 1227

/ 17 10Gb Ethernet

/ 17 10Gb Ethernet

64 Node Cluster Configuration

Page 26: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

26 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

UCSD Express

UCS 6200 Series Fabric Interconnect

UCS Manager

UCS C240 M4 Series Rack Server

UCS C3160 Rack Server

Hadoop Cluster Profile

Template Redhat

RHEL 6.5 HDFS CLDB YARN

ZooKeeper Hbase Hive

Oozie Hue

Spark Key-Value

Store Indexer

Solr Sqoop Impala Flume PIG

MAHOUT Falcon

Tez Storm

Ganglia

Cisco UCS Service Profile

NIC MACs

HBA WWNs Server UUID

VLAN Assignments

VLAN Tagging FC Fabrics

Assignments FC Boot

Parameters Number of vNICs

Boot order PXE settings IPMI Settings

Number of vHBAs QoS

Call Home

Template Association

Org & Sub Org Assoc.

Server Pool Association

Statistic Thresholds BIOS scrub

actions Disk scrub actions

BIOS firmware Adapter firmware

BMC firmware RAID settings Advanced NIC

settings Serial over LAN

settings BIOS Settings

Apache Ambari

Unified Management Programmability, Scalability and Automation

Page 27: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

27 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

UCS 6200 Series Fabric Interconnect

UCS C240 M4 Series Rack Server

UCS C3160 Rack Server

Data

Data

Data

Cold n replicas on

Archive

Warm 1 replicas on Disk,

n-1 on Archive

Hot All (n) replicas on

Disk

Cold

Hot

Policy Hot - for both storage and compute. The data that is popular and still being used for processing will stay in this policy. When a block is hot, all replicas are stored in DISK. Warm - partially hot and partially cold. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in ARCHIVE. Cold - only for storage with limited compute. The data that is no longer being used, or data that needs to be archived is moved from hot storage to cold storage. When a block is cold, all replicas are stored in ARCHIVE.

Multi-tiered Storage Architecture Multi-temperature Policy

Page 28: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

28 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Software and Solutions Innovation Empowering What’s Next

UCS 6200 Series Fabric Interconnect

UCS C240 M4 Series Rack Server

UCS C3160 Rack Server

Data

Data

Data

Cold n replicas on

Archive

Warm 1 replicas on Disk,

n-1 on Archive

Hot All (n) replicas on

Disk

Cold

Hot

Mover – A new data migration tool It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement.

A

C

D

A

C

D

E

A

C

D

E

N

N N

N

E

Multi-tiered Storage Architecture Multi-temperature Policy

Page 29: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Next Steps…

Download the Hortonworks Sandbox Learn Hadoop

Build Your Analytic App

Try Hadoop

Learn more with our partnerships

http://hortonworks.com/partner/cisco/

http://hortonworks.com/partner/redhat/

Joint CVD bit.ly/Cisco-CVD

Page 30: A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat

30 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

•  Cisco Live! in San Diego – June 7 - 11 •  Hadoop Summit in San Jose – June 9 – 11 •  Red Hat Summit in Boston - June 23-26 More information about Red Hat’s Big Data solutions please visit:

•  redhat.com/bigdata •  redhatstorage.redhat.com/category/big-data •  redhat.com/en/insights/big-data

More information about Cisco’s Big Data and Analytics Offers please visit: •  www.cisco.com/go/bigdata and www.cisco.com/go/bigdata_design •  http://blogs.cisco.com/author/raghunathnambiar •  bit.ly/Cisco-CVD

30

Meet us in person!