SAP FORUM İSTANBUL 2016 - BÜYÜK VERİNİZİ SAP HANA İLE ANALİZ EDEBİLİRSİNİZ

SAP FORUM İSTANBUL Reimagine Business for the Digital Economy

Buyuk Verinizi SAP HANA ile Nasil Analiz Edersiniz

Speaker’s Name : Ilker Tasdemir

Department : Profesyonel Hizmetler ve Servis Direktoru

© 2016 SAP SE or an SAP affiliate company. All rights reserved. 2 Internal

Agenda

MDS ap Firma Tanitimi

Big Data Dilemma

Is Hadoop = Big Data?

SAP HANA and Hadoop Use Cases

Why Do We Need SAP?

Who Are We? MDS ap Tech Firma Tanitimi


Eastern Europe

Middle East

Africa

45+ years of Experience in IT (Since 1967)

4500+ Employees in 30 countries across 3 continents

150+ companies unified under the group

100+ top resellers awards from global IT Leaders

A 3.5 billion USD Leader offering stability & high Integrity in Technology & Solutions

SAP Partner Centre of Excellence

MDS ap Tech Overview

a MIDIS Group Company

Over 24 Years of in depth experiences helping customers Manage, Integrate, Analyze and

Mobilize Business Mission critical Data across the enterprise; Exceptional track record

providing Turnkey IT Solutions across Turkey, Middle East & Europe.

A Unique Partnership with SAP; Implementing Excellence; Optimizing Application

Management

Strategic long term partnerships with our customers; Focusing on Customer Satisfaction

and Technology Innovation

Help customers better use their data assets to improve business performance and make

smarter decisions


The MDS ap Differentiator

5

MDSap with the best of breed SAP Business Analytics Platform

provides a complete Agile Visualization & Advanced Analytics

Solutions that optimizes Any Data Variety, regardless of its

structure, at Real-Time Velocity, to deliver next generation analytics


Customer Base Over 400 Enterprise Customers

• Turkey Customers:

• Akbank

• Albaraka Türk

• Anadolu Sigorta

• DHL

• Halk Emeklilik

• Halk Sigorta

• İETT

• ING Bank

• İş Yatırım

• Meteoroloji Genel Müdürlüğü

• PTT

• T.C. Maliye Bakanlığı

• T.C. Orman Ve Su İşleri Bakanlığı

• TEB

• Toprak Mahsulleri Ofisi

• Turkcell

• Türkiye Finans Katılım Bankası

• Türkiye İş Bankası

• VakıfBank

• Ziraat Bankası

• Regional Customers:

• Abu Dhabi Investment Authority (Adia)

• ADIB

• Ahlibank

• Bank Dhofar

• Bank Nizwa

• BISB

• Boubyan Bank

• Emirates NBD

• Kuwait Credit Bank

• Kuwait Finance House

• Orange

• Qatar Islamic Bank

• RTA

• Saudi Arabian Monetary Agency

• Saudi Credit Bureau


Rich Ecosystem Over 25 Partners

http://www.bradmark.com/index.html

Big Data Dilemma 3 V’s, 3+1 V’s, 5V’s, 6 V’s of Big Data


BIG DATA ACCELERATED DRAMATICALLY THE OBSOLESCENCE OF IT LANDSCAPE

CRM data

GPS

Demand

Speed

Velocity

Transactions

Oppo

rtunit

ies

Service calls

Customer

Sales orders

Inventory

E-mails

Tweets

Planning

M2M Mobile

Instant messages

Volume

Variety Velocity

Value Variability

COMPLEX

Validity


Desktop

Hobbyist

The Future?

Internet

Big Data

Byte : one grain of rice

Kilobyte : cup of rice

Megabyte : 8 bags of rice

Gigabyte : 3 Semi trucks

Terabyte : 2 Container Ships

Petabyte : Blankets Manhattan

Exabyte : Blankets west coast states

Zettabyte : Fills the Pacific Ocean

Yottabyte : AN EARTH SIZE RICE BALL!

How did we reach here?

2008

NSA's 1,500,000 square foot data center being built outside Salt Lake City

will be the first facility to house a yottabyte of data

2012

18,000 BC

1991

1928


Typical “Best-Practice” Approach


Typical “Best Practice” Approach

• Drop useful data by introducing ETL “bias”

• Potentially insightful data is lost

• Create latency as volumes increase and sources change

• Duplicate data through staging environments to support ETL

• Expensive “reactive” hardware to support processing scale requirements

Impact if we keep the current architecture

Is Hadoop = Big Data?


What is Hadoop?

Microsoft Confidential

14

Distributed, scalable system on commodity HW

Composed of a few parts:

HDFS – Distributed file system

MapReduce – Programming model

Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm

Main players are Hortonworks, Cloudera, MapR

WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead)

Core Services

OPERATIONAL SERVICES

DATA SERVICES

HDFS

SQOOP

FLUME

NFS

LOAD & EXTRACT

WebHDFS

OOZIE

AMBARI

YARN

MAP REDUCE

HIVE & HCATALOG

PIG

HBASE FALCON

Hadoop Cluster

compute

&

storage . . .

. . .

. . compute

&

storage

.

.

Hadoop clusters provide

scale-out storage and

distributed data processing

on commodity hardware


What does Hadoop Provide?

• “A place to store unlimited amounts of data in any format inexpensively

• Allows collection of data that you may or may not use later: “just in case”

• A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read”

• Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW

• Frees up expensive EDW resources (storage and processing), especially for data refinement

• Allows for data exploration to be performed without waiting for the EDW team to model and load the data

• Some processing in better done on Hadoop than ETL tools

• Also called bit bucket, staging area, landing zone or enterprise data hub

• Typical players are Hortonworks, Cloudera, MapR


The Real Cost of Hadoop

http://www.wintercorp.com/tcod-report/





Hadoop versus HANA

Hadoop HANA

Data Architecture Unstructured data and files on disk Structured data in memory

Data Structures No predefined schema (Schema On

Read)

Predefined schemas and

models

Performance Slow data access, seconds to hours Very fast, milliseconds to

seconds

Scalability Scale-out to hundreds and thousands of

commodity nodes

Scale-up/Scale-out to many

servers

Data Consistency BASE (Basic Availability, Soft State,

Eventual Consistency)

ACID (Atomic, Consistency,

Isolation, Durability)

Licensing Cost Free open source or commercial open

source

Many options from cloud to

enterprise

SAP HANA and Hadoop Use Cases


When to use Hadoop vs HANA

• Hadoop has the lowest storage cost and highest data type flexibility, but also

the slowest processing speed

• SAP HANA has the highest processing speed and data conformity, but also

more limited by cost and data type

• Key is to leverage strengths of both platforms

• Hadoop + HANA = Infinite Storage and Instant Insight!


Hadoop as Flexible Data Store

Use Hadoop to capture all types of data from

multiple sources

• SAP and non-SAP, internal and external

sources

• Full fidelity, lowest level granularity capture,

and storage of data of any type allows

preservation of data for future use

• Store and retrieve very large data sets and

objects

• Aggregate and consolidate OLTP data in

Hadoop to create OLAP fact tables for SAP

HANA

• Feed SAP HANA, SAP BusinessObjects,

Predictive Analytics via Hive, or Data

Services ETL

• Interactive Big Data Exploration


Flexible Data Store Example

Data Descriptions

Data Stream Capture Real-time capture of high-volume data streams such as machine generated

log and sensor data, real-time Web logs

Document and Multimedia

storage

Very high volume storage of business documents(healthcare, insurance).

Rapid high volume storage and retrieval of media and BLOBs for social and

Web applications like Facebook using HBase

Social Media and Email Real-time capture of social and email text data for sentiment analytics, email

archiving

OLTP Transaction Data Capture of high volume OLTP transactions such as call centre, inventory, and

any other process transactions. Aggregate transactions and build OLAP fact

tables for SAP HANA. ETL via SAP Data Services to SAP HANA

Reference Data Copy of existing large reference data sets such as GIS, survey, industry-

specific data sets can be combined with other data for analytics

Data Archive Archive of system logs, audit data, and other data that otherwise would go to

long-term, off-site storage


Hadoop as a Processing Engine

Use Hadoop as a data processing engine for

ETL rationalization to feed SAP HANA

• MapReduce programs execute process logic

• Pig for data analysis

• Mahout for data mining and machine

learning

• Replicate master data to Hadoop for data

processing

• Feed results to SAP HANA with Data

Services and merge with conformed data

model


Processing Engine Example

Data Descriptions

Data Cleansing and

Enrichment

Fix data issues in Hadoop, enhance with additional information

ETL Rationalization Low-latency ingestion of data from operational systems

Tiered-storage: High-Valued Data loaded and transformed in HANA in

parallel, off-load preprocessing to Hadoop

Data Mining and Predictive

Analytics

Correlation, clustering, regression analysis. Predict machine failure, correlate

customer behavior across systems

Identify differences Differences in large, but different sets of data such as DNA analysis

Risk Analysis Fraud detection, identity risk patterns


Hadoop and SAP HANA for Analysis

How does Hadoop fit into the Data Analytics Process with SAP HANA and BI

• Hadoop can store such high volumes of data that it often can’t be replicated into SAP HANA in a cost effective

or timely manner

• Some of the analysis must be done in Hadoop, as well as SAP HANA

• Queries executed in Hadoop take much longer to run than SAP HANA

• Analysis will likely require combining data from Hadoop, SAP HANA and other data sources

Combined Analytics:

• Two Phase Analytics

• Addresses long running Hadoop query times

• Run analysis continually on Hadoop, the periodic updates to SAP HANA for fast interactive query

response

• Federated Queries

• Split analysis into parts and run asynchronously on Hadoop, SAP HANA other systems

• Federate results in SAP HANA or BI


Two Phase Analytics

• Hadoop runs data mining, statistical analysis, OLAP fact table generation – “slow” analytics

• SAP Data Services ETL process pushes results to SAP HANA for “fast” analytics


Federated Queries

• Split Analysis into multiple queries, consolidate results


Federation Scenarios

Client Side Federation

• BI Tool queries separately and combines the results

• Only for smaller data and result sets

Query Federation:

• Server-side execution of multiple queries and results combined

• Better for large data sets

Data Federation:

• Hadoop data virtualized as a table by another database like SAP HANA


SAP HANA and Hadoop Integration

SAP HANA can integrate with Hadoop

• Smart Data Access

• Virtual Table created in SAP HANA

points to remote Hive source,

queries pushed down to Hive

• SAP Data Services

• Connect via Hive, HDFS

• Push MapReduce jobs to Hadoop

with Pig scripts

• SAP HANA Vora

• Native Spark processing with push-

down logic to Hadoop

• Vora Adapter for HANA to utilize

SDA


“Typical” versus “Innovative”

• Entire “universe” of data is captured and maintained

• Mining of data via transformation on read leaves all data in place

• Refineries leverage the power of the cloud and traditional technologies

• Integration with traditional data warehousing methodologies

• Scale can be pushed to cloud for more horsepower

• Orchestration of data is a reality (less rigid, more flexible, operational)

• Democratization of predictive analytics, data sets, services and reports

Why Do We Need SAP?


What’s the meaning of Life, Universe and Everything?

In the radio series and the first novel (1978), a group of

hyper-intelligent pan-dimensional beings demand to

learn the Answer to the Ultimate Question of Life, The

Universe, and Everything from the supercomputer, Deep

Thought, specially built for this purpose.

https://youtu.be/aboZctrHfK8

It takes Deep Thought 7½ million years to compute and check the answer.

https://en.wikipedia.org/wiki/Deep_Thought_(The_Hitchhiker's_Guide_to_the_Galaxy)

https://en.wikipedia.org/wiki/Deep_Thought_(The_Hitchhiker's_Guide_to_the_Galaxy)

https://www.youtube.com/watch?v=aboZctrHfK8


The Answer to Life, Universe and Everything is…


What’s the Meaning of 42?

“The answer seems meaningless

because the beings who instructed it

never actually knew what the Question

was”

“Deep Thought can built a machine to

calculate the real question in 10M

years”


Conclusion

“We don’t have 7 ½ million years to ask silly

questions.

We don’t have 10 million years to decide what

question to ask”

© 2016 SAP SE or an SAP affiliate company. All rights reserved.

Thank you

Contact information:

Ilker Tasdemir

Profesyonel Hizmetler ve Servis Direktoru

[email protected]

+90 532 549 9392 / +971 50 712 9169

SAP FORUM İSTANBUL 2016 - BÜYÜK VERİNİZİ SAP HANA İLE ANALİZ EDEBİLİRSİNİZ

Technology

Transcript of SAP FORUM İSTANBUL 2016 - BÜYÜK VERİNİZİ SAP HANA İLE ANALİZ EDEBİLİRSİNİZ