Post on 21-Jan-2015
description
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Emery Berger
University of Massachusetts Amherst
Operating SystemsCMPSCI 377
Dynamic Memory Management
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 2
Dynamic Memory Management
How the heap manager is implemented
malloc, free
new, delete
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Memory Management
Ideal memory manager:
Fast
Raw time, asymptotic runtime, locality
Memory efficient
Low fragmentation
With multicore & multiprocessors:
Scalable to multiple processors
New issues:
Secure from attack
Reliable in face of errors
3
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Memory Manager Functions
Not just malloc/free
realloc
Change size of object, copying old contents
ptr = realloc (ptr, 10);
But: realloc(ptr, 0) = ?
How about: realloc (NULL, 16) ?
Other fun
calloc
memalign
Needs ability to locate size & object start
4
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Fragmentation
Intuitively, fragmentation stems from “breaking” up heap into unusable spaces
More fragmentation = worse utilization of memory
External fragmentation
Wasted space outside allocated objects
Internal fragmentation
Wasted space inside an object
5
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Classical Algorithms
First-fit
find first chunk of desired size
6
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Classical Algorithms
Best-fit
find chunk that fits best
Minimizes wasted space
7
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Classical Algorithms
Worst-fit
find chunk that fits worst
then split object
Reclaim space: coalesce free adjacent objects into one big object
8
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Techniques
Freelists
Linked lists of objects in same size class
Range of object sizes
First-fit, best-fit in this context
9
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Techniques
Segregated size classes
Use free lists, but never coalesce or split
Choice of size classes
Exact
Powers-of-two
10
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Techniques
Big Bag of Pages (BiBOP)
Page or pages (multiples of 4K)
Usually segregated size classes
Header contains metadata
Locate with bitmasking
Limits external fragmentation
Can be very fast
11
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Runtime Analysis
Key components
Cost of malloc (best, worst, average)
Cost of free
Cost of size lookup (for realloc & free)
Examine for first-fit, best-fit, segregated (with BiBOP)
12
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Space Bounds
Fragmentation worst-case for “optimal”: O(log M/m)
M = largest object size
m = smallest object size
Best-fit = O(M * m) !
Goal: perform well for typical programs
Considerations:
Internal fragmentation
External fragmentation
Headers (metadata)
13
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Performance Issues
We’ll talk about scalability later
Reliability, too
But: general-purpose allocator often seen as too slow
14
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 15
Custom Memory Allocation
Very common practice
Apache, gcc, lcc, STL, database servers…
Language-level support in C++
Widely recommended
Programmers replace new/delete, bypassingsystem allocator
Reduce runtime – often
Expand functionality –sometimes
Reduce space – rarely
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 16
Drawbacks of Custom Allocators
Avoiding system allocator:
More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:
Mix memory from customand general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 17
Class1 free list
(1) Per-Class Allocators
a
b
c
a = new Class1;
b = new Class1;
c = new Class1;
delete a;
delete b;
delete c;
a = new Class1;
b = new Class1;
c = new Class1;
Recycle freed objects from a free list
+ Fast+ Linked list operations
+ Simple+ Identical semantics
+ C++ language support
- Possibly space-inefficient
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 18
(II) Custom Patterns
Tailor-made to fit allocation patterns
Example: 197.parser (natural language parser)
char[MEMORY_LIMIT]
a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);
a b cd
end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array
+ Fast+ Pointer-bumping allocation
- Brittle- Fixed memory size
- Requires stack-like lifetimes
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 19
(III) Regions
+ Fast+ Pointer-bumping allocation
+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)
regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Dangling
references
- Too much space
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 20
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
helcc
mud
lle
No
rma
lize
d R
un
tim
e
Custom Win32
non-regions regions
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 21
Not So Fast…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc
non-regions regions
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 22
The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose allocator
Optimized for common allocation patterns Per-size quicklists ≈ per-class allocation
Deferred coalescing(combining adjacent free objects) Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science 23
Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
No
rmalized
Sp
ace
Custom DLmalloc
regionsnon-regions
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Custom Allocators?
Generally not worth the trouble:use good general-purpose allocator
Avoids risky software engineering errors
24
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Problems with Unsafe Languages
C, C++: pervasive apps, but langs.memory unsafe
Numerous opportunities for security vulnerabilities, errors
Double free
Invalid free
Uninitialized reads
Dangling pointers
Buffer overflows (stack & heap)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Soundness for “Erroneous” Programs
Normally: memory errors ) ? …
Consider infinite-heap allocator:
All news fresh;ignore delete
No dangling pointers, invalid frees,double frees
Every object infinitely large
No buffer overflows, data overwrites
Transparent to correct program
“Erroneous” programs sound
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Probabilistic Memory Safety
Approximate with M-heaps (e.g., M=2)
DieHard: fully-randomized M-heap
Increases odds of benign errors
Probabilistic memory safety
i.e., P(no error) n
Errors independent across heaps E(users with no error) n * |users|
? Efficient implementation…
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Implementation Choices
Conventional, freelist-based heaps
Hard to randomize, protect from errors
Double frees, heap corruption
What about bitmaps? [Wilson90]
– Catastrophic fragmentation
Each small object likely to occupy one page
obj obj objobj
pages
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Heap Layout
Bitmap-based, segregated size classes
Bit represents one object of given size
i.e., one bit = 2i+3 bytes, etc.
Prevents fragmentation
00000001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Allocation
malloc(8):
compute size class = ceil(log2 sz) – 3
randomly probe bitmap for zero-bit (free)
Fast: runtime O(1)
M=2 – E[# of probes] · 2
00000001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
malloc(8):
compute size class = ceil(log2 sz) – 3
randomly probe bitmap for zero-bit (free)
Fast: runtime O(1)
M=2 – E[# of probes] · 2
00010001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
Randomized Allocation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
00010001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
Randomized Deallocation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Deallocation
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
00010001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
free(ptr):
Ensure object valid – aligned to right address
Ensure allocated – bit set
Resets bit
Prevents invalid frees, double frees
00000001 1010 10
size = 2i+3 2i+4 2i+5
metadata
heap
Randomized Deallocation
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Randomized Heaps & Reliability
2 34 5 3 1 6
object size = 2i+4object size = 2i+3
…
11 6 3 2 5 4 …
My Mozilla: “malignant” overflow
Your Mozilla: “benign” overflow
Objects randomly spread across heap Different run = different heap
Errors across heaps independent
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
DieHard software architecture
broadcast vote
input output
execute replicas(separate processes)
replica3seed3
replica1seed1
replica2seed2
Replication-based fault-tolerance Requires randomization: errors independent
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
DieHard Results
Analytical results (pictures!)
Buffer overflows
Uninitialized reads
Dangling pointer errors (the best)
Empirical results
Runtime overhead
Error avoidance
Injected faults & actual applications
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Model overflow as write of live data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Model overflow as write of live data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Model overflow: random write of live data
Heap half full (max occupancy)
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding overflow in at least one replica
rep
lica
s
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding overflow in at least one replica
rep
lica
s
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
Replicas: Increase odds of avoiding overflow in at least one replica
rep
lica
s
P(Overflow in all replicas) = (½)3 = 1/8
P(No overflow in > 1 replica) = 1-(½)3 = 7/8
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Analytical Results: Buffer Overflows
F = free space H = heap size N = # objects
worth of overflow
k = replicas
Overflow one object
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Empirical Results: Runtime
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
Empirical Results: Error Avoidance
Injected faults:
Dangling pointers (@50%, 10 allocations) glibc: crashes; DieHard: 9/10 correct
Overflows (@1%, 4 bytes over) – glibc: crashes 9/10, inf loop; DieHard: 10/10 correct
Real faults:
Avoids Squid web cache overflow Crashes BDW & glibc
Avoids dangling pointer error in Mozilla DoS in glibc & Windows
UNIVERSITY OF MASSACHUSETTS AMHERST • Department of Computer Science
The End
47