Prof. Uri Weiser,Technion

Professor Uri WeiserTechnionHaifa, Israel

Handling Memory Accesses in Big Data Environment

Chipex 2016

1The talk covers research done by: T. Horowitz , Prof. A. Kolodny, T. Morad, , Prof. A. Mendelson, Daniel Raskin, Gil Shomron, Loren Jamal, Prof. U. Weiser

A New Architecture Avenues in Big Data Environment

The Era of Heterogeneous HW/SW fits application Dynamic tuning Accelerators performance, energy efficiency

Big Data = big In general non repeated access to all the

“Big Data” What are the implications?

Heterogeneous computing :Application Specific Accelerators

Performance/power

Apps range

Continue performance trend by tuned architecture to bypass current technological hurdles

erAccelerators

Tuned architectures

Apps behavior

A New Architecture Avenues in Big Data Environment

Heterogeneous computing – ”tuning” HW to respond to specific needs

example: Big Data memory access pattern Potential savings

Reduction of Data Movements and bypass DRAM

Bandwidth issue Potential solution

Input: Unstructured data

Big Data usage of DATA

Read OnceNon-Temporal

Memory Access

Funnel

beta=BWout

Structuring

Input: Unstructured data

Structured data (aggregation)

ML Model creation

Data structuring = ETL

C Model usage @ client

Machine Learning

Does Big Data exhibit special memory access pattern?

It probably should since Revisiting ALL Big Data items will cause huge/slow

data transfers from Data sources There are 2 access modes of memory operations:

Temporal Memory Access Non-Temporal Memory access

Many Big Data computations exhibit a Non-Temporal Memory-Accesses and/or Funnel operation

Non-Temporal Memory access Initial analysis: Hadoop-grep Single Memory Access Pattern

~50% of Hadoop-grep unique memory references are single access

Non-Temporal Memory AccessesPreliminary Results

WordCount:Access to Storage:Non-temporal locality

Sort: Access to Storage:NO Non-temporal locality

0 5 10 15 20 25 30 35 40 45 500

1000020000300004000050000600007000080000

WordCount I/O Utilization

Time [s]

0 200 400 600 800 1000 12000

20000400006000080000

100000120000

SORT I/O

Time [s]

Access rate[KB/s]

Where energy is wasted?

• DRAM

• Limited BW

From: Mark Horowitz, Stanford “Computing’s Energy Problems”

From: Bill Dally (nVidia and Stanford), Efficiency and Parallelism, the challenges of future computing11

Energy:

Memory Subsystem - copies

LL Cache

NV Storage

RegistersKBs

10’s KBs

10’s MBs

3GB/sec

25GB/sec

500GB/sec

TB/sec

BW- Source

Copy 1 (main memory)

Copy 2 (LL Cache)

Copy 3 (L2 Cache)

Copy 4 (L1 Cache)

Copy 5 (Registers) - Destination

Memory Subsystem – DRAM bypass == DDIO

LL Cache

NV Storage

Registers

3-20GB/sec

25GB/sec

500GB/sec

TB/sec

BW- Source

Copy 1 (main memory)

Copy 2 (LL Cache)

Copy 3 (L2 Cache)

Copy 4 (L1 Cache)

Copy 5 (Registers) - Destination

Potential savings:

@ 0.5n J/B (DRAM)10 – 20 GB/s NV BW

5W – 10W

Reference: “Optimizing Read-Once Data Flow in Big-Data Applications”Morad, Ghomron, Erez, Weiser, Kolodny, in Computer Architecture Letters Journal 2016 14

BandwidthWhen should we use Funnel at the Data source

Memory Hierarchy is Optimized for A: Bandwidth issue System are built for Temporal Locality

16Highest Bandwidth

LLC Cache

NV Storage

RegistersKBs

10’s KBs

10’s MBs

3-20GB/sec

25GB/sec

500GB/sec

TB/sec

BW Existing BW

NTMA Desired BW

# of cores

Bandwidth[MB/s] INITIAL RESULT

S# of cores

CPUutilization

Bandwidth[MB/s]Read Once – Non-Temporal Memory Accesses

CPU Utilizations

SSD=CPU Bandwidth

# of cores

Bandwidth[MB/s]

CPUutilization

[%]Temporal Memory Accesses

# of cores

Bandwidth[MB/s]

Hint: Memory access per operation

B: Memory access per operation impact BW

CPU Utilizations

CPU Bandwidth

SSD Bandwidth

Solution:

Flow of “Non-Temporal Data Accesses”

LLC Cache

NV Storage

Registers

Non-Temporal

Memory accesses

Temporal locality

accesses

The Funnel

Use Funnel when Bandwidth bottleneck occurs- “high” memory accesses per Instruction- Limited BW- Non temporal locality memory access

*private communication with: Moinuddin Qureshi

“Funnel”ing “Read-Once” data in storage

*Kang, Yangwook, Yang-suk Kee, Ethan L. Miller, and Chanik Park. "Enabling cost-effective data processing with smart ssd." In Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on, pp. 1-12. IEEE, 2013.**K. Eshghi and R. Micheloni. “SSD Architecture and PCI Express Interface”

Typical SDD architecture*

Analytical model of the Funnel

Post process

Bandwidth (BW) IN

Bandwidth BW OUT

Funnel

= BWOUT/BWIN

Purposed Architecture

PCIeTL

CPU performs NTMA and TMA work

SSD Storage

Funnel

B=Bandwidth

Baseline Configuration

CPU performs TMA workSSD performs NTMA work

Funnel

Funnel Configurations

Funnel Performance

CPU becomes bottleneck

drops as CPU becomes bottleneck

𝟏𝐏𝐂𝐈𝐞𝐁𝐖

𝟏𝐒𝐒𝐃𝐁𝐖

PCIeTL

SSD Storage

Funnel

B=Bandwidth

PCIeTL

CPU performs: TMA work

SSD performs NTMA work

Funnel

beta beta

Funnel energy

Funnel improvement

SSD BW is the

bottleneck

CPU becomes the bottleneck

PCIe is the bottleneck

Energy grows as

performance

advantage shrinks

Funnel processor overhead

PCIeTL

SSD Storage

Funnel

B=Bandwidth

PCIeTL

CPU performs TMA work

SSD performs NTMA work

Funnel

CPU becomes the bottleneck

Funnel configuration

baseline configuration

Solution: ?

Non-Temporal Memory Accesses should be processed as close as possible to the data sourceData that exhibit Temporal Locality should use current Memory HierarchyUse Machine Learning (context aware*) to distinguish between the two phasesOpen questions:

SW modelShared DataHW implementationComputational requirement at the “Funnel”

*Reference: “Semantic locality and Context based prefetching” Peled, Mannor, Weiser, Etsion in ISCA 2015 24

Summary

Memory access is a critical path in computingFunnel should be used for:

Resolve BW systems’ bottleneck for specific applicationsSolve the System’s BW issues for “Read Once” cases

Reduction of Data movementFree up system’s memory resources (re-Spark)Simple-energy-efficient engines at the front endIssues

Prof. Uri Weiser,Technion

Business

Transcript of Prof. Uri Weiser,Technion

Weg weiser - BFHb5805180-2ce6-4320-a499-9cd14dfbd34… · Weg weiser ‣ Weiterbildung bfh.ch/wirtschaft Weiterbildungsprogramm 2019 BFH Wirtschaft EMBA-, MAS-, DAS-, CAS-Studiengänge

MDg Christoph Weiser - Vogelfiles.vogel.de/iww/iww/quellenmaterial/dokumente/090362.pdf · MDg Christoph Weiser Unterabteilungsleiter IV C HAUSANSCHRIFT Wilhelmstraße 97, 10117 Berlin

Weiser Ferenc: Watomika, az utolsó delaware főnök · PPEK 929 Weiser Ferenc: Watomika, az utolsó delaware főnök . Weiser Ferenc . Watomika, az utolsó delaware f. ő. nök .

Barbara Weiser Recht auf Bildung für Flüchtlinge - ggua.de · Barbara Weiser Recht auf Bildung für Flüchtlinge Rahmenbedingungen des Zugangs zu Bildungsangeboten für Asylsuchende,

Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)

Technion Dossier - UGent

Info2011 Riva Zohar - Technion

Technion Alumni Perks 2012

Mark Weiser

xCrane PLUS xCrane PRO - Technion · xCrane PRO – Technion kärkiohjaus anturoiduille nostureille xCrane PRO tuo kuormaimen käytön aivan uudelle tasolle kärkiohjauksen muodossa

DIP meeting report – Technion , March 2013gkoren/DIP_Oded.pdfDIP meeting report – Technion , March 2013 Yoav Kalcheim (HU) Gad Koren, Tal Kirzhner (Technion) Jason Robinson, Mehmet

A parallel K-SVD implementation for CT image - Technion

RS Oswin Weiser Pößneck Gleichungen mit einer Variablen lösen.

Higgs PhysicsShaouly Bar-Shalom1 arXiv/1405.2924 [PRD 2015] SBS (Technion), A. Soni (BNL) & J. Wudka (UCR) Shaouly Bar-Shalom Technion, Israel (shaouly@physics.technion.ac.il)

Weiser Dawidek

Coroana Weiser

The Weiser Halloween Guide

Mga uri ng Pang-uri

Presentation Tempelhof By Helge Weiser

Ultrashort Laser Pulses - Technion