agrawal (4)
-
Upload
noahwatkins7595 -
Category
Documents
-
view
226 -
download
0
Transcript of agrawal (4)
-
8/9/2019 agrawal (4)
1/35
Generating Realistic Impressions
for File-System BenchmarkingNitin Agrawal
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
-
8/9/2019 agrawal (4)
2/35
For better or for worse,
benchmarks shape a fieldDavid Patterson
2
-
8/9/2019 agrawal (4)
3/35
-
8/9/2019 agrawal (4)
4/35
FS images in past: use what is convenient
Randomly generated files ofseveral MB (FAST 08)1000 files in 10 dirs w/ random data (SOSP 03)
188GB and 129GB volumes in Engg dept (OSDI 99)
5-deep tree, 5 subdirs, 10 8KB files in each (FAST 04)
10702 files from /usr/local, size 354MB (SOSP 01)
Typical desktop file system w/ no description (SOSP 05)
1641 files, 109 dirs, 13.4 MB total size (OSDI 02)
-
8/9/2019 agrawal (4)
5/35
Performance offindoperation
5
Relative
Time
Taken
Disk layout & cache state
File-system logical
organization
-
8/9/2019 agrawal (4)
6/35
Problem scope
Characteristics of file-system images have strong impact on performance
We need to incorporate representative file-system images in benchmarking & design
6
How to create representativefile-system images?
-
8/9/2019 agrawal (4)
7/35
Requirements for creating FS images
Access to data on file systems and disk layout Properties of file-system metadata [Satyanarayan81,
Mullender84, Irlam93, Sienknecht94, Douceur99, Agrawal07]Disk fragmentation [Smith97]More such studies in future? A technique to create file-system images that
isRepresentative: given a set of input distributionsControllable: supply additional user constraints Reproducible: control & report internal parametersEasy to use: for widespread adoption and consensus
7
-
8/9/2019 agrawal (4)
8/35
Introducing Impressions Powerful statistical framework to generate
file-system images
Takes properties of file-system attributes as input
Works out underlying statistical details of the image Mounted on a disk partition for real benchmarkingSatisfies the four design goals
Applying Impressions gives useful insightsWhat is the impact on performance and storage size?How does an application behave on a real FS image?
8
-
8/9/2019 agrawal (4)
9/35
Outline
Introduction Generating realistic file-system images
Applying Impressions: Desktop search
Conclusion
9
-
8/9/2019 agrawal (4)
10/35
Overview ofImpressions
10
Impressions
-
8/9/2019 agrawal (4)
11/35
Properties of file-system metadataFive-year study of file-system metadata [FAST07]
(Agrawal, Bolosky, Douceur, Lorch)
Used as exemplar for metadata properties in Impressions
11
-
8/9/2019 agrawal (4)
12/35
Features of Impressions Modes of operation for different usages
Basic mode: choose default settings for parametersAdvanced mode: several individually tunable knobs
Thorough statistical machinery ensures accuracyUses parameterized curve fitsAllows arbitrary user constraints
Built-in statistical tests for goodness-of-fit
Generates namespace, metadata, file content, anddisk fragmentation using above techniques
12
-
8/9/2019 agrawal (4)
13/35
Creating valid metadata Creating file-system namespace
Uses Generative Model proposed earlier [FAST 07]Explains the process of directory tree creationAccurately regenerates distribution of directory
size and of directory depth
13
-
8/9/2019 agrawal (4)
14/35
Creating namespace
14
Directory tree Monte Carlo runIncorporates dirs by depth
and dirs by subdir counti
Probability of parent selection Count(subdirs)+2
50
60
70
80
90
100
0 2 4 6 8 10 12 14 16Cumulative%o
fdirectories
Count of subdirectories
Directories by Subdirectory Count
DG
Dirs by subdir count
00.020.040.060.08
0.10.120.140.160.18
0 2 4 6 8 10 12 14 16Fraction
ofdirectories
Namespace depth (bin size 1)
Directories by Namespace Depth
DG
Dirs by namespace depth
DatasetGenerated
-
8/9/2019 agrawal (4)
15/35
Creating valid metadata Creating file-system namespace Creating files: stepwise process
File size, file extension, file depth, parent directoryUses statistical models & analytical approximations
15
-
8/9/2019 agrawal (4)
16/35
Example: creating realistic file sizes
Pure lognormal distribution no longer good fit Hybrid model: lognormal body, Pareto tail
Fits observed data more accurately, used to recreatefile sizes in Impressions
Con
tribution
tousedspace
Containing file size (bytes, log scale)
LognormalHybrid
16
File Sizes
-
8/9/2019 agrawal (4)
17/35
Creating files
17
File Size ModelLognormal body, Pareto tail
Captures bimodal curveiS2S4S6S8S9 S1S3S5S7
0
0.02
0.04
0.06
0.08
0.1
0.12
0 8 2K 512K 512M128G
Fractio
n
ofbytes
File Size (bytes, log scale)
Files by Containing Bytes
DG
Files by containing bytes
-
8/9/2019 agrawal (4)
18/35
Creating files
18
File Extensions Percentile valuesTop 20 extensions account
for 50% of files and bytesiS2S4S6S8S9 S1S3S5S7 E2E4E6E8E9 E3E5E7 E1
0
0.2
0.4
0.6
0.8
1
Desired Generated
Fractio
n
offiles
Top Extensions by Count
cppdllexegifhhtm
jpgnulltxt
others
cppdllexegifhhtm
jpgnulltxt
others
Top extensions by count
-
8/9/2019 agrawal (4)
19/35
Creating files
19
File DepthPoissonMultiplicative model along
with bytes by depth
00.020.040.06
0.080.1
0.120.140.16
0 2 4 6 8 10 12 14 16
Fraction
offiles
Namespace depth (bin size 1)
Files by Namespace Depth
DG
Files by namespace depth
16KB
64KB
256KB
768KB2MB
0 2 4 6 8 10 12 14 16Meanby
tesperfile
(log
scale)
Namespace depth (bin size 1)
Bytes by Namespace Depth
DG
Bytes by namespace depth
iS2S4S6S8S9 S1S3S5S7 E2E4E6E8E9 E3E5E7 E1D2D4D6D8D9 D3D5D7 D1
-
8/9/2019 agrawal (4)
20/35
Creating files
20
Parent Dir Inverse Polynomial Satisfies distribution of dirs
with file count
0
0.05
0.1
0.15
0.2
0.25
0 2 4 6 8 10 12 14 16
Fraction
offiles
Namespace depth (bin size 1)
Files by Namespace Depth(with Special Directories)
DG
Files by namespace depthw/ special dirs
i
-
8/9/2019 agrawal (4)
21/35
Resolving arbitrary constraints
21
-
8/9/2019 agrawal (4)
22/35
0
0.05
0.1
0.15
8 2K 512K 8M
Fraction
offiles
File Size (bytes, log scale)
C
0
0.05
0.1
0.15
8 2K 512K 8M
Fraction
offiles
File Size (bytes, log scale)
O
Resolving arbitrary constraints
Accurate both for the sum and the distribution22
Original Constrained
Contrived
for sum
Constraint: Given count of files & size distribution, ensure
sum of file sizes matches a desired total file system size
-
8/9/2019 agrawal (4)
23/35
Resolving arbitrary constraints Arbitrarily specified on file system parameters Variant of NP-complete Subset Sum Problem
Approximation algorithm based solution (in paper)Oversampling to get additional sample valuesLocal improvement to iteratively converge to the
desired sum by identifying best-fit in current sample While constraints are satisfied, constrained
distribution also retains original characteristics23
-
8/9/2019 agrawal (4)
24/35
Interpolation and extrapolation Why dont we just use available data values?
Limited to empirical data in input dataset
What-if analysis beyond available dataset
Efficient to maintain compact curve fits and useinterpolation/extrapolation instead of all data
Technique: Piecewise interpolation
24
-
8/9/2019 agrawal (4)
25/35
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 2 832
128
512
2K
8K
32K
128K
512K
2M
8M
32M
128M
512M
2G
8G
32G
128G
Fractiono
fbytes
File Size
Piecewise Interpolation
100 GB
50 GB
10 GB
Segment 19Segment 19Segment 19Segment 19
0
0.02
0.04
0.06
0 50 100SegmentValue
File System Size (GB)
Interpola9on: Seg 1
Interpolation technique & accuracyPiecewise Interpolation
25
Each distribution broken down into segments Data points within a segment used for curve fit Combine segment interpolations for new curve
0
0.02
0.04
0.06
0.08
0.1
0.12
8 2K 512K 128M 32G
Fraction
ofbytes
File Size
Extrapolation (125 GB)
RE
File Size extrapolation 125GB
0
0.02
0.04
0.06
0.08
0.1
0.12
8 2K 512K 128M 32G
Fraction
ofbytes
File Size
Interpolation (75 GB)
RI
File Size interpolation 75GBReal
InterpolatedExtrapolated
Real
-
8/9/2019 agrawal (4)
26/35
File content Files having natural language content
Word-popularity model (heavy tailed)
Word-length frequency model (for the long tail)
Content for other files (mp3, gif, mpeg etc)Impressions generates valid header/footerUses third-party libraries and software
26
-
8/9/2019 agrawal (4)
27/35
Disk layout and fragmentation
27
-
8/9/2019 agrawal (4)
28/35
Disk layout and fragmentation Simplistic technique
Layout Scorefor degree of fragmentation [Smith97]
Pairs of file create/delete operations till desiredlayout score is achieved In future more nuanced ways are possible
Out-of-order file writes, writes with long delaysAccess to file-system specific interfaces
FIPMAP in Linux, XFS_IOC_GETBMAP for XFSPerhaps a tool complementary to Impressions
28
File 1 File 2
1 non-contiguous block (out of 8)File Layout Score = 7/8 All blocks contiguousFile Layout Score = 1 (6/6)
-
8/9/2019 agrawal (4)
29/35
Outline
Introduction Generating realistic file-system images
Applying Impressions: Desktop search
Conclusion
29
-
8/9/2019 agrawal (4)
30/35
Applying Impressions Case study: desktop search
Google desktop for linux (GDL) and Beagle
Metrics of interest:
Size of index, time to build initial search indexIdentifying application bugs and policies
GDL doesnt index content beyond 10-deep Computing realistic rules of thumb
Overhead of metadata replication?30
-
8/9/2019 agrawal (4)
31/35
-
8/9/2019 agrawal (4)
32/35
Impact of metadata and content
Developer aid: understanding impact of different
file system content & different index schemes
32
00.5
11.5
22.5
33.5
Original
TextCache
DisDir
DisFilte
rRelativeIndexSize
Beagle: Index Size
DefaultText
Image
Binary
-
8/9/2019 agrawal (4)
33/35
Impact of metadata and content
Reproducing identical file-system image to
compare other apps or ones developed later
33
00.5
11.5
22.5
33.5
Origina
l
TextC
ache
DisDir
DisFilte
rRelativeIndexSize
Beagle: Index Size
DefaultText
Image
Binary
Future AppBeagle
-
8/9/2019 agrawal (4)
34/35
Conclusion Impressions framework for realistic FS images
Representative, controllable, reproducible, easy to use
Includes almost all file system params of interest
Extensible architecturePlug in new statistical constructs, new models for
metadata and content generation Powerful utility for file-system benchmarking
To be contributed publicly (coming soon) http://www.cs.wisc.edu/adsl/Software/Impressions
34
-
8/9/2019 agrawal (4)
35/35
Nitin Agrawalhttp://www.cs.wisc.edu/~nitina
Impressions download (coming soon)http://www.cs.wisc.edu/adsl/Software/Impressions
35
Questions?
i