A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat
-
Upload
hortonworks -
Category
Software
-
view
403 -
download
4
Transcript of A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks and Red Hat
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A Comprehensive Approach to Building Your Big Data Solution
We do Hadoop.
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Speakers
Hortonworks ◦ Ali Bajwa, Senior Partner Solution Engineer
Red Hat ◦ Irshad Raihan, Senior Principal, Product Marketing
Cisco ◦ Ron Graham, Big Data Analytics Engineer
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Partnership
100% open source Hadoop Distribu5on, Support and Training
Middleware, Storage, PaaS, IaaS
UCS Integrated Infrastructure For Big Data
CISCO, HORTONWORKS AND RED HAT ARE PARTNERING TO HELP YOU BUILD YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY, SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF
OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Modern Data Architecture • Enable applications to have access to
all your enterprise data through an efficient centralized platform
• Supported with a centralized approach governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP
EDW
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer
Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL
Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
AN
ALY
TIC
S D
ATA
SYST
EMS
Data Marts
Business Analytics
Visualization & Dashboards
HDP 2.2
ELT °
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data, Deeper Archive & New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
Geoloca3on Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SOU
RC
ES
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Driver: Enabling the data lake SC
ALE
SCOPE
Data Lake Definition • Centralized Architecture
Multiple applications on a shared data set with consistent levels of service
• Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.
Drivers: 1. Cost Optimization 2. Advanced Analytic Apps
Goal: • Centralized Architecture • Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Only HDP delivers a Centralized Architecture HDP is uniquely built around YARN serving as a data operating system that provides multi-tenant Resource Management, consistent Governance & Security and efficient Operations services across Hadoop applications.
Hortonworks Data Platform
YARN Data Operating System
• A centralized architecture of consistent enterprise services for resource management, security, operations, and governance.
• The versatility to support multiple applications and diverse workloads from batch to interactive to real-time, open source and commercial.
Key Benefits
• Multiple applications on a shared data set with consistent levels of service: a multitenant data platform.
• Provides a shared platform to enable new analytic applications.
• Delivers maximum cost efficiency for cluster resource management. Fewer servers fewer nodes.
Storage
YARN: Data Operating System
Governance Security
Operations
Resource Management
Existing Applications
New Analytics
Partner Applications
Data Access: Batch, Interactive & Real-time
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a completely open data platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.
Completely Open
• HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations
• All components are developed in open source and then rigorously tested, certified, and delivered as an integrated open source platform that’s easy to consume and use by the enterprise and ecosystem.
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e C
asca
ding
A
pach
e H
Bas
e A
pach
e A
ccum
ulo
Apa
che
Sol
r A
pach
e S
park
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP: Any Data, Any Application, Anywhere
Any Application • Deep integration with ecosystem
partners to extend existing investments and skills
• Broadest set of applications through the stable of YARN-Ready applications
Any Data Deploy applications fueled by clickstream, sensor, social, mobile, geo-location, server log, and other new paradigm datasets with existing legacy datasets.
Anywhere Implement HDP naturally across the complete range of deployment options
Clickstream Web & Social
Geoloca3on Internet of Things
Server Logs
Files, emails ERP CRM SCM
hybrid
commodity appliance cloud
Over 70 Hortonworks Certified YARN Apps
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Open Source IS the standard for platform technology Modern platform standards are defined by open communities
For Hadoop, the ASF provides guidelines and a governance framework and the open community defines the standards for Hadoop.
Roadmap matches user requirements not vendor monetization requirements
Hortonworks Open Source Development Model yields unmatched efficiency • Infinite number of developers under governance of ASF applied to problem
• End users motivated to contribute to Apache Hadoop as they are consumers • IT vendors motivated to align with Apache Hadoop to capture adjacent opportunities
Hortonworks Open Source Business Model de-risks investments • Buying behavior changed: enterprise wants support subscription license
• Vendor needs to earn your business, every year is an election year • Equitable balance of power between vendor and consumer
• IT vendors want platform technologies to be open source to avoid lock-in
TITLE SLIDE: HEADLINE
Presenter name Title, Red Hat Date
Red Hat Big Data Open the possibili5es of your data
13
Big Data innova3on cannot happen in a bubble Strong partnerships with industry leaders and open source communi5es
14
Business User Architect Data Center Operator App Developer
Mul5ple Silos. Mul5ple Views. Mul5ple Goals.
The Old Data Lifecycle
Manage Build Code Query
15
Business User
Architect Data Center Operator
App Developer
One Language. One View. One Goal.
The New Data Lifecycle
Ingest Integrate
Act Discover
16
Lack of agile, open, and cost effec5ve enterprise-‐grade solu5ons
Barriers to Big Data Success
I want more than canned BI queries
I am locked into a vendor stack
I want to use my favorite dev framework
I need to integrate data across silos
Business User
Architect Data Center Operator
App Developer
17
Business User
Architect
Data Center Operator
App Developer
Ingest
Integrate
Act
Discover
Big Data Solu3ons from Red Hat
Integrated Big Data PlaOorm
Cisco UCS Integrated Infrastructure for Big Data
Hadoop Compatible File System
Red Hat Storage
Hadoop Data Processing Map/Reduce YARN
Analytics
Operating System Red Hat Enterprise Linux
Cloud Red Hat Enterprise Linux
OpenStack Platform
Operating Environment Data Integration & Application Development
Application Platform- as-a-Service
OpenShift by Red Hat
Data Integration and Data Services
Red Hat JBoss Data Virtualization
Data Caching Red Hat JBoss
Data Grid
Business Rules Mgmt Red Hat JBoss BRMS
Development Red Hat JBoss
Developer Studio Hadoop
Distributed File
System
Man
agem
ent
Hortonworks Cisco Red Hat
Data Integration and Data Services
Composite
Cloud Cisco OpenStack
Pig Spark Storm HBase Tez Hive
Cisco Security Suite C
isco
UC
S D
irect
ory
Exp
ress
C
isco
Uni
fied
Man
agem
ent
Am
bari
Virtualization Red Hat Enterprise
Virtualization
Software and Solutions Innovation Empowering What’s Next
Ron Graham Big Data Analytics Engineer
Hardware Architecture Cisco UCS with Big Data
20 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
Why Cisco UCS for Big Data?
• Manageability • Save time with UCS Manager
• Enables consistent and rapid deployments using UCS Service profiles
• Offers operational simplification • Delivers a modular solution
• Scalability • Performance
SIM Card Identity for a phone
Service Profile Identity for a server
21 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
• End to end provisioning, installation, and monitoring tool for Hadoop Clusters
• Better business outcomes with faster time to value from Big Data
• Provides appliance like experience with out inflexibilities
• Centralized visibility across Hadoop and physical infrastructure
• Powerful interface for further integration into third party tools and services
UCS Director Express for Big Data End to end solution for Hadoop
22 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next Powering Big Data and Analytics
UCS B200 Scale-‐out Analy5cs
Big Data with EMC Isilon
and VCE
Invicta (Fast Data)
UCS C240 (Hadoop, NoSQL
MPP)
UCS Manager, Director, Express, Central, Redhat
ACl
C/B460 (In-‐memory Analy5cs)
UCS C3160, C3260
(Hadoop)
UCS C220 (real-‐5me, streaming)
FlexPod Select with
NetApp E-‐Series UCS Mini (All-‐in-‐one
at Edge) UCS M-‐Series (Massive
scale-‐out)
Ac5an, DataStax, Hortonworks, MongoDB, Pivotal,SAP, SAS, Splunk
Cisco, Elas5c Search, IBM, Informa5ca, MicrosoZ, MicroStrategy , Oracle, SAP, SAS and others
Complete and Industry
leading Por[olio
Ecosystem Partners
ISV Partners
Infrastructure Management
Data Management
Applica5ons
23 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
DESIGNS Big Data
Cisco Validated Designs for leading big data platforms can be found at: www.cisco.com/go/bigdata
Cisco Validated Designs Accelerate Deployment
24 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
Server 8x UCS C220 M4 CPU 2 x Intel Xeon
E5-2620 v3 (15M Cache, 2.40 GHz)
Memory 256GB Storage 8 1.2-TB 10K SAS
SFF HDD
Starter High Performance
Server 8x UCS C220 M4 CPU 2 x Intel Xeon
E5-2680 v3 (30M Cache, 2.50 GHz)
Memory 384GB Storage 2 1.2-TB 10K SAS
SFF HDD, 6 400-GB SAS SSD
Performance Optimized Capacity Optimized Extreme Capacity
Server 16x UCS C240 M4 CPU 2 x Intel Xeon E5-2680
v3 (30M Cache, 2.50 GHz)
Memory 256GB Storage 2 120-GB SATA SSD,
24 1.2-TB 10K SAS SFF HDD
Server 16x UCS C240 M4 CPU 2 x Intel Xeon
E5-2620 v3 (15M Cache, 2.40 GHz)
Memory 128GB Storage 2 120-GB SATA
SSD. 12 4-TB 7.2K SAS SFF HDD
Server 2x UCS C3160 CPU 2 x Intel Xeon
E5-2695 v2 (30M Cache, 2.40 GHz)
Memory 256GB Storage 2 120-GB SATA
SSD, 60 4-TB 7.2K SAS SFF HDD
Cisco UCS CPA for Big Data v3 Reference Architecture and Bundles
25 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
2x UCS 6296 Series Fabric Interconnect
UCS Manager
• UCS Domain (68 Servers) • Manage by UCS Manager • 2.8 PB of storage
• HDP 2.2 • Tiered Storage • Tez
• RHEL 6.5
• Dual 10G Network
• 17 Servers Per Rack
UCS C240 M4 2x E5-2680 v3
256GB Memory Cisco 12Gb/s SAS Raid Controller
2x 120GB STAT SSD 24x 1.2TB 10k SAS
2x Cisco UCS VIC 1227
UCS C3160 2x E5-2695 v2
256GB Memory Cisco 12Gb/s SAS Raid Controller
2x 120GB SATA SSD 60x 4TB 7.2k SAS SFF
2x Cisco UCS VIC 1227
/ 17 10Gb Ethernet
/ 17 10Gb Ethernet
64 Node Cluster Configuration
26 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
UCSD Express
UCS 6200 Series Fabric Interconnect
UCS Manager
UCS C240 M4 Series Rack Server
UCS C3160 Rack Server
Hadoop Cluster Profile
Template Redhat
RHEL 6.5 HDFS CLDB YARN
ZooKeeper Hbase Hive
Oozie Hue
Spark Key-Value
Store Indexer
Solr Sqoop Impala Flume PIG
MAHOUT Falcon
Tez Storm
Ganglia
Cisco UCS Service Profile
NIC MACs
HBA WWNs Server UUID
VLAN Assignments
VLAN Tagging FC Fabrics
Assignments FC Boot
Parameters Number of vNICs
Boot order PXE settings IPMI Settings
Number of vHBAs QoS
Call Home
Template Association
Org & Sub Org Assoc.
Server Pool Association
Statistic Thresholds BIOS scrub
actions Disk scrub actions
BIOS firmware Adapter firmware
BMC firmware RAID settings Advanced NIC
settings Serial over LAN
settings BIOS Settings
Apache Ambari
Unified Management Programmability, Scalability and Automation
27 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
UCS 6200 Series Fabric Interconnect
UCS C240 M4 Series Rack Server
UCS C3160 Rack Server
Data
Data
Data
Cold n replicas on
Archive
Warm 1 replicas on Disk,
n-1 on Archive
Hot All (n) replicas on
Disk
Cold
Hot
Policy Hot - for both storage and compute. The data that is popular and still being used for processing will stay in this policy. When a block is hot, all replicas are stored in DISK. Warm - partially hot and partially cold. When a block is warm, some of its replicas are stored in DISK and the remaining replicas are stored in ARCHIVE. Cold - only for storage with limited compute. The data that is no longer being used, or data that needs to be archived is moved from hot storage to cold storage. When a block is cold, all replicas are stored in ARCHIVE.
Multi-tiered Storage Architecture Multi-temperature Policy
28 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
Software and Solutions Innovation Empowering What’s Next
UCS 6200 Series Fabric Interconnect
UCS C240 M4 Series Rack Server
UCS C3160 Rack Server
Data
Data
Data
Cold n replicas on
Archive
Warm 1 replicas on Disk,
n-1 on Archive
Hot All (n) replicas on
Disk
Cold
Hot
Mover – A new data migration tool It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement.
A
C
D
A
C
D
E
A
C
D
E
N
N N
N
E
Multi-tiered Storage Architecture Multi-temperature Policy
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Next Steps…
Download the Hortonworks Sandbox Learn Hadoop
Build Your Analytic App
Try Hadoop
Learn more with our partnerships
http://hortonworks.com/partner/cisco/
http://hortonworks.com/partner/redhat/
Joint CVD bit.ly/Cisco-CVD
30 © 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential
• Cisco Live! in San Diego – June 7 - 11 • Hadoop Summit in San Jose – June 9 – 11 • Red Hat Summit in Boston - June 23-26 More information about Red Hat’s Big Data solutions please visit:
• redhat.com/bigdata • redhatstorage.redhat.com/category/big-data • redhat.com/en/insights/big-data
More information about Cisco’s Big Data and Analytics Offers please visit: • www.cisco.com/go/bigdata and www.cisco.com/go/bigdata_design • http://blogs.cisco.com/author/raghunathnambiar • bit.ly/Cisco-CVD
30
Meet us in person!