Kilmo Choi rlfah926@naver.com

Post on 25-Feb-2016

51 views 6 download

description

A Software Memory Partition Approach for Eliminating Bank-level Interference in Multicore Systems. Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao , Mingyu Chen, Chengyong Wu. Kilmo Choi rlfah926@naver.com. Contents. Background and Motivation Bank-Level Partition Mechanism(BPM) - PowerPoint PPT Presentation

Transcript of Kilmo Choi rlfah926@naver.com

Embedded System Lab.

Embedded System Lab.최 길 모

Kilmo Choirlfah926@naver.com

A Software Memory Partition Approach for Eliminating Bank-level Interference in Multi-

core Systems

Lei Liu, Zehan Cui, Mingjie Xing, Yungang Bao, Mingyu Chen, Chengyong Wu

Embedded System Lab.최 길 모

Contents

Background and Motivation

Bank-Level Partition Mechanism(BPM)

Results

Conclusion

Reference

Embedded System Lab.최 길 모

Background and Motivation

Memory bank

- The same set of memory access speed

Multicore platform

- Multiple banks can serve memory re-

quests independently and concurrently

Bank-Level Parallelism

Embedded System Lab.최 길 모

Background and Motivation Row buffer conflict

Causes performance degradation(throughput slowdown and unfairness )

ex. row buffer hit rate decrease from 1 core(over 60%) to 16 core(35%)

Core CoreCore CoreCore Core

Row 0

Row 1

Row 2

Row 3

Activate Operation

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Row 0

Row 1

Row 2

Row 3

R/W

Activate Operation

Core Core

Row-buffer Conflict

PrechargeOperation

Row-buffer Hit

Core 0 Access data in Row 1

Core 0 Access in the same page Core 1 Access data on Row 3

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Numerous new memory scheduling algorithms have been proposed

to address the interference problem However, these algorithms usually employ complex scheduling logic and need

hardware modification to memory controllers

Overview of BPM OS memory management system uses a page-coloring mechanism to partition

banks into several groups and maps each thread (process) to a specific bank group Address mapping policy

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM)

Core Core Core Core

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Row 0Row 1Row 2Row 3

row buffer

Bank Bank Bank Bank Bank Bank Bank Bank

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Advantages

row buffer conflict ↓ row buffer hit ↑

BPM is entirely software approach Flexible

Easier for OS to monitor thread’s behavior than hardware

Bank-level conflicts can be fully eliminated by exclusively mapping a thread’s data to specific banks

How much influence the performance of thread amount of available bank?

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM) Discover bank bit by software method(Algorithm)

(Uncached)

0

1

xFOR

x

Higher latency Row{ }

Left parts Remain{ }

Row 0

Row 1

Row 2

Row 3

Row hit

Row miss

0

1

yFOR y{FOR x}

Higher latency Column{ }

Left parts BANK{ }

Row 0

Row 1

Row 2

Row 3

Row miss

x0

1

Row{ }

Remain{ }

Row 0

Row 1

Row 2

Row 3Mapped to different

banks

Embedded System Lab.최 길 모

Bank-Level Partition Mechanism(BPM)

Advantages

row buffer conflict ↓ row buffer hit ↑

BPM is entirely software approach Flexible

Easier for OS to monitor thread’s behavior than hardware

Embedded System Lab.최 길 모

Results Environments

4 cores, 2.8GHz Intel Core i7-860 processor, 8GB DDR3 main memory with

64banks, 5 bank bits

CentOS Linux 5.4 with kernel 2.6.32.15

SPEC CPU2006 benchmarks

Embedded System Lab.최 길 모

Results Overall system performance

Embedded System Lab.최 길 모

Results Page-Policy and Power

Embedded System Lab.최 길 모

Results BPM VS Cache-Partition-Only

The correlation between BPM improvements and Per-core bandwidth

Embedded System Lab.최 길 모

Conclusion BPM is a new approach to eliminate the interference between threads

and improve the overall system performance

BPM achieves this goal by assign different group of banks to different

threads to eliminate inter-thread bank-level interference

This leads to the reduction of row buffer misses as well as the energy

consumption of memory system

Embedded System Lab.최 길 모

Reference J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gain-

ing Insights into Multicore Cache Partitioning: Bridging the Gap be-

tween Simulation and Real Systems. In HPCA-14, 2008.

Junghoon Kim, Junghan Kim, Youngik Eom. A Page Coloring Scheme

through Page Cache Separation for Improving Cache Performance, In

NIPA-2010

Dimitris Kaseridis, Jeffrey Stuecheli, Lizy Kurian John. Minimalist

Open-page: A DRAM Page-mode Scheduling Policy for the Many-

core Era. In MICRO 44, 2011

Embedded System Lab.최 길 모

부록 : Page Coloring

virtual page numberVirtual address page offset

physical page numberPhysical address Page offset

Address translation

Cache tag Block offsetSet indexCache address

Physically indexed cache

page color bits

… …

OS control

=

• Physically indexed caches are divided into multiple regions (colors).• All cache lines in a physical page are cached in one of those regions (colors).

OS can control the page color of a virtual page through address mapping (by selecting a physical page with a specific value in its page color bits).

Embedded System Lab.최 길 모

부록 : Page Coloring

… …

...

………

………

Physically indexed cache

………

………

Physical pages are grouped1234

i+2

ii+1

…Process 1

1234

i+2

ii+1

…Process 2