Luc Bouganim Björn Þór Jónsson Philippe Bonnet

SRBIAU, Kurdistan Campus

Luc Bouganim Björn Þór JónssonPhilippe Bonnet

Danesh Zandi , Afshin Rahmany & mohamad kavosi

Spring 12

1

Assistant Professor : Kyumars Sheykh Esmaili

uFLIP: Understanding Flash IO Patterns

Overview1.Introduction

1.Motivation2.Contributions3.How Flash Devices Work4.Why the State Matters

2.Content1.Definitions2.The Benchmark3.Benchmarking Methodology4.Device Evaluations

3.Review1.Problems2.Evaluation3.Conclusion

2

Introduction: Motivation

Flash devices (vs. HDDs)

Faster

More robustSoon as big

•Lower latency•Higher throughput

More complex to handle

ReadWrite → Program/Erase

New/adapted algorithms?

We need to understand the devices!

3

Introduction: Contributions

The uFLIP Benchmark

Benchmarking Methodology

Device Evaluations

Consisting of 9 micro benchmarks

How to apply the benchmark

Example evaluations of a set of devices

4

Introduction: How Flash Devices Work

Units

Page: ~2KBBlock: 64 * 2KB ≈ 128 KB

Read

Program

Default state: 1Program → 0

Erase

Back to defaultOnly possible 10⁵ to 10⁶ timesPer block → slow

5

Introduction: How Flash Devices Work

Flash Chips

Block Manager

Wear levelingMaps LBA to flash pagePossibly trades in-place for writes into free pages Possibly asynchronous page reclamation

6

Introduction: Why the State Matters

General principles are well known

Details not.

Flash Devices are black-boxes.

# free pages unknown

Time of next erase unknown

Cost of I/O operation is non uniform in time Depends on state of device

7

Content: Definitions

I/O operation

Time, size, LBA, read/write

Baseline patterns

Sequential/random read, sequential/random write

Time

Consecutive, pause, burst

Logical Block Address

Sequential, random, ordered, partitionedTarget offset/size, shift

OthersIOIgnore, IOCount 8

Content: Definitions

9

Content: The Benchmark(s)

Granularity

Locality

I/O size

Target size

10

Content: The Benchmark(s)

Partitioning

Order

Parallelism

Target space divided into partitionsOperations within partition are sequential

Linear increase/decrease, in-place

Target space divided into subsetsEach accessed by different process 1

1

Content: Benchmarking Methodology

Device stateOut of the box 16KB write: 1msecAfter writing whole device: 8msec

Well defined initial state

„Write the whole flash device completely yields a well-defined state.“

Start-up Phase

Defined by IOIgnore

Running Phase

Defined by IOCount – IOIgnore

Content: Device Evaluations

Devices are from 2009

Range from USB stick over IDE modules to SSDs

From $12 to $943 and 2GB to 32GB

More expensive → faster

Parallelism has no effect

Review: Problems

They vary only one parameter at a time

„Writing the whole flash device completely yields a well-defined state.“

Interactions between parameters not captured

Multidimensinal graphs can be analyzedFull factorial design is not feasible

•e.g. what if locality and partitioning work well together?

•Why not 2^k factorial design?

Next paragraph:„...by performing random IOs of random size (ranging from 0.5KB tothe flash block size, 128KB) on the whole device.“„writes“ or „random IOs“?What does „the whole device“ mean?

•All LBAs? All flash pages (not possible)? Total size?14

Review: Evaluation

The paper was interesting to read.

Of the 3 contributions:

The results are obsolete (but interesting).The methodology are (mostly) well known benchmarkingbest practices.The benchmark is still valid and useful.

More explanations for the results

15

Review: Conclusion

Many areas for improvement

AutomationCapturing interaction

SSDs are getting more and more important

Evaluation with todays devices

Parallesim?

No alternative offers as much information

16

End of presentation 17

Luc Bouganim Björn Þór Jónsson Philippe Bonnet

Documents

Transcript of Luc Bouganim Björn Þór Jónsson Philippe Bonnet