Issues on using SSD under Linux - USENIX · PDF fileOutline SSD primer Optimal I/O for SSD...
Transcript of Issues on using SSD under Linux - USENIX · PDF fileOutline SSD primer Optimal I/O for SSD...
About SSD
Dongjun ShinSamsung Electronics
Outline SSD primer Optimal I/O for SSD Benchmarking Linux FS on SSD Case study: ext4, btrfs, xfs Design consideration for SSD What’s next?
New interfaces for SSD Parallel processing of small I/O
SSD Primer (1/2) Physical unit of flash memory
PageNAND – unit for read & write BlockNAND – unit for erase (a.k.a erasable block)
Physical characteristics Erase before rewrite Sequential write within an erasable block
LBA space(visible to OS)
Flash memory space
NAND page (24kB) NAND block = 64128 NAND pages
Flash Translation Layer
SSD Primer (2/2) Internal organization: 2dimensional (NxM parallelism)
Similar to RAID0 (stripe size = sector or pageNAND) Effective page & block size is multiplied by NxM (max)
SSDControllerrunning
F/W(FTL)
Host I/F(ex. SATA)
Nchannel(striping)
Mway (pipelining)
0 4 8 1232 36 40 44
1 5 9 1333 37 41 45
2 6 10 1434 38 42 46
3 7 10 1535 39 43 47
16 20 24 2848 52 56 60
17 21 25 2949 53 57 61
18 22 26 3050 54 58 62
Ch0
Ch1
Ch2
Ch3
Chip0 Chip1 Chip2 Chip3
32 36 40 4464 68 72 76
48 52 56 6080 84 88 92
Optimal I/O for SSD Key points
Parallelism• The larger the size of I/O request, the better
Match with physical characteristics• Alignment with page or block size of NAND*• Segmented sequential write (within an erasable block)
What about Linux? HDD also favors larger I/O readahead, deferred aggregated write Segmented FS layout good if aligned with erasable block boundary Write optimization FS dependent (ex. allocation policy)
* Usually, partition layout is not aligned (1st partition at LBA 63)
Test environment (1/2) Hardware
Intel Core 2 Duo [email protected], 1GB RAM Software
Fedora 7 (Kernel 2.6.24) Benchmark: postmark
Filesystems No journaling ext2 Journaling ext3, ext4, reiserfs, xfs
• ext3, ext4: data=writeback,barrier=1[,extents]• xfs: logbsize=128k
COW, logstructured btrfs (latest unstable, 4k block), nilfs (testing8) SSD
Vendor M (32GB, SATA): read 100MB/s, write 80MB/s Test partition starts at LBA 16384 (8MB, aligned)
Test environment (2/2) Postmark workload
Ref: Evaluating Blocklevel Optimization through the IO Path (USENIX 2007)
9G/17G9.7G/12G600M/1.8G630M/755M*
Total appread/write
10,0004,2500.13MLL10,0001,0000.13MLS100,000100,000915KSL100,00010,000915KSS
# of transaction
# of file (workset)
File sizeWorkload
* Mostly writeonly
Benchmark results (1/2) Small file size (SS, SL)
SS SL0
500
1000
1500
2000
2500
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
trans
actio
n/se
c
Benchmark results (2/2) Large file size (LS, LL)
LS LL0
5
10
15
20
25
30
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
trans
actio
n/se
c
I/O statistics (1/2) Average size of I/O
0
20
40
60
80
100
120
140
SS SL LS LL SS SL LS LL
read write
Avg
I/O s
ize
(Kby
tes)
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
I/O statistics (2/2) Segmented sequentiality of write I/O (segment: 1MB)
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
SS SL LS LL
ext2 ext3 ext4 reiserfs xfs btrfs nilfs
100% 100% 100% 100%
Case study ext4 Condition
data=ordered, allocation: default/noreservation/oldalloc
0
200
400
600
800
1000
1200
SS SL
trans
actio
n/se
c
ext4wb ext4ord ext4nores ext4olda
1. Almost no difference between allocation policies2. Why data=ordered is better for SL?
Case study btrfs Condition
Block size: 4k/16k, allocation: ssd option on/off
0
200
400
600
800
1000
1200
1400
1600
1800
SS SL LS LL
trans
actio
n/se
c
btrfs4k btrfs16k btrfsssd4k
1. 4k is better than 16k(sequentiality = 12% : 2%)2. ssd option is effective(1040% improvement)
Case study xfs Condition
Mount with barrier on/off
0
100
200
300
400
500
600
700
800
SS SL LS LL
trans
actio
n/se
c
xfsbar xfsnobar
Large barrier overhead...
Design consideration for SSD Lessons from flash FS (ex. logfs)
Sequential writing at multiple logging points Wandering tree
• Traceoff between sequentiality vs. amount of write• Cf. space map (Sun ZFS)
Need to optimize garbage collection overhead• Either FS itself or FTL in SSD
Next topic: Endtoend optimization Exchange info with SSD (trim, SSD identification) Make best use of parallelism
New interfaces for SSD (t13.org) Trim command
Let device know which LBA range is not used• This will be helpful for optimizing FTL
Should be passed through: FS bio scsi libata• Passing bio with no data• What about I/O reordering & I/O queuing?
SSD identification (added to “ATA identify”) Report size of page and erasable block
• Physical or effective? Useful for FS and volume manager
Parallel processing of small I/O Make better use of I/O queuing (TCQ or NCQ)
Parallel processing of small I/O Desktop environment? Barrier?
A B C D A B C D
AB,C
D
without I/O queuing, 4 steps with I/O queuing, 2 steps
requestqueue
AB,C
D
Ch0
Ch1
Ch2
Ch3
Ch0
Ch1
Ch2
Ch3
chip is busy chip is idle
1 2 3 4 1 2
Summary Optimization for SSD
Alignment is important Segmented sequentiality Make better use of parallelism (either small or large)
• I/O barrier may stall the pipelined processing
What can you do? File system: alignment, allocation policy, design (ex. COW) Block layer: bio w/ hint, barrier, I/O queueing, scheduler(?) Volume manager: alignment, allocation Virtual memory: readahead
References T13 spec for SSD
http://www.t13.org/documents/UploadedDocuments/docs2007/e07153r0Soild_State_Drive_Identify_Proposal.doc http://www.t13.org/documents/UploadedDocuments/docs2007/e07154r0Notification_for_Deleted_Data_Proposal_for_ATAACS2.doc
Introduction to SSD and flash memory http://download.microsoft.com/download/a/f/d/afdfd50d6eb9425e84e1b4085a80e34e/WNST432_WH07.pptx http://download.microsoft.com/download/d/f/6/df6accd54bf249848285f4f23b7b1f37/WinHEC2007_Micron_NAND_FlashMemory.doc http://download.microsoft.com/download/a/f/d/afdfd50d6eb9425e84e1b4085a80e34e/SSS486_WH07.pptx
FTL description & optimization BPLRU: A Buffer Management Scheme for Improving Random Writes
in Flash Storage (FAST ’08)
Appendix. I/O Pattern SS workload – ext4, xfs
Appendix. I/O Pattern SS workload – btrfs, nilfs
Appendix. I/O Pattern SL workload – ext4, xfs
Appendix. I/O Pattern SL workload – btrfs, nilfs
Appendix. I/O Pattern LS workload – ext4, reiserfs, xfs
Appendix. I/O Pattern LS workload – btrfs, nilfs
Appendix. I/O Pattern LL workload – ext4, reiserfs, xfs
Appendix. I/O Pattern LL workload – btrfs, nilfs