Post on 14-Aug-2020
© Prof. Dr.-Ing. Wolfgang Lehner |
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery
Ismail Oukid*°, Daniel Booss°, Wolfgang Lehner*, Peter Bumbulis°, and Thomas Willhalm+
*Dresden University of Technology
°SAP AG + Intel GmbH
DaMoN 2014, Snowbird, Utah, USA, June 23, 2014
Diese Zeile ersetzt man über: Einfügen > Kopf- und Fußzeile NICHT im Master
© Ismail Oukid | | 2
What is Storage Class Memory?
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Storage Class Memory
Non-Volatile Memory
Byte-addressable (load/store semantics)
Latency close to DRAM‘s, with writes slower than
reads
Denser than DRAM
(i.e. better scaling)
More energy efficient than
DRAM
e.g. PCM, STTRAM, MRAM and Memristors
© Ismail Oukid | | 3
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
© Ismail Oukid | | 4
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
HDD
© Ismail Oukid | | 5
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
HDD
DRAM
© Ismail Oukid | | 6
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
HDD
DRAM
Bridging the gap between memory
and storage
© Ismail Oukid | | 7
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
HDD
DRAM
SSD
Bridging the gap between memory
and storage
© Ismail Oukid | | 8
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
HDD
DRAM
SSD
SCM
Bridging the gap between memory
and storage
© Ismail Oukid | | 9
SCM Compared to Other Technologies
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Latency
Bandwidth
Storage Class Memory is a merging point between memory and storage
HDD
DRAM
SSD
SCM
Bridging the gap between memory
and storage
© Ismail Oukid | | 10
Main Memory DB Restart Time
• Availability guarantees are an important part of many SLAs
• Availability of 99% downtime of 3.65 days/year!
• Many database crash conditions are transient
• A reasonable approach to recovery: restarting the database
• Database restart time impacts directly database availability
• Long restart times: a shortcoming of main memory databases
Improve database restart time using SCM
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
© Ismail Oukid | | 11
Overview of SOFORT (1)
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery
Architecture: • Single-level column-store • Dictionary encoded • Multi version concurrency control (MVCC) • No transactional log
Requirements: • Fast data recovery • Mixed OLTP and OLAP • A hybrid SCM-DRAM memory environment
© Ismail Oukid | | 12
Overview of SOFORT (2)
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
SOFORT
MVCC Array
Table
Column
Dict. Index Dict. Index (optional)
Tables
Value IDs
Columns
CTS DTS MVCC Entry
Persisted on SCM
Volatile in DRAM
TX Array
Dictionary Encoded
Append-only
On DRAM to enable better performance
© Ismail Oukid | | 13
Persisting Data on the Fly
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
SCM accessed via the CPU cache: Writes not guaranteed to have reached SCM Need to flush data from cache
Code reordered at compilation time & Complex CPU out-of-order execution Need to order memory operations
Persistence primitives: flushing instructions (CLFLUSH), memory barriers (MFENCE, SFENCE, LFENCE) and non-temporal stores (…)
Data may still be held in memory controller buffers Need to flush memory controller buffers on power failures
© Ismail Oukid | | 14
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
persisted = true;
Flush persisted;
…
© Ismail Oukid | | 15
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Cache SCM
var 0 0
persisted False False
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
persisted = true;
Flush persisted;
…
© Ismail Oukid | | 16
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
persisted = true;
Flush persisted;
…
Cache SCM
var 0 0
persisted True True
© Ismail Oukid | | 17
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
persisted = true;
Flush persisted;
…
Cache SCM
var 1 0
persisted True True
© Ismail Oukid | | 18
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
MFENCE;
persisted = true;
Flush persisted;
…
Cache SCM
var 0 0
persisted False False
© Ismail Oukid | | 19
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
Flush var;
MFENCE;
persisted = true;
Flush persisted;
…
Cache SCM
var 0 0
persisted False False
© Ismail Oukid | | 20
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
MFENCE;
Flush var;
MFENCE;
persisted = true;
Flush persisted;
…
Cache SCM
var 0 0
persisted False False
© Ismail Oukid | | 21
Persisting Data on the Fly: Example
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Int var = 0;
Bool persisted = false;
…
var = 1;
MFENCE;
Flush var;
MFENCE;
persisted = true;
MFENCE;
Flush persisted;
MFENCE;
…
Cache SCM
var 0 0
persisted False False
© Ismail Oukid | | 22
Persistent Memory Allocator
Persistent Memory File System (PMFS):
• SCM-aware file system
• No buffering in DRAM on mmap direct access to SCM
PMAllocator:
• Huge PMFS files as memory pages
• Pages cut into segments for allocation
• Persistent counter of memory pages
• Mapping from persistent memory to VM
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
…
PMFS file = Memory page
File name = Unique page ID
Segment
© Ismail Oukid | | 23
Persistent Memory Pointers
We propose Persistent Memory Pointers (PMPtrs):
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
Struct PMPtr { int64_t m_BaseID; ptrdiff_t m_Offset; }
Persistent memory page ID
Offset indicating the start of the allocated block
Regular pointers are bound to the program‘s address space Cannot be used for recovery
• Can be converted (swizzled) to regular pointers • PMPtrs stay valid across failures
© Ismail Oukid | | 24
Recovery Mechanism
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
PM Entry Point
Reload PMAllocator
Reload SOFORT
Reload Table 1
Reload Table M
Reload column 1
Reload column N
…
…
• A PMPtr to the PMAllocator • A PMPtr to the storage engine
• Reload memory pages • Reconstruct mapping to VM
• Rollback unfinished transactions • Allocate ressources for recovery
• Data structures sanity check • Initiate columns recovery
• Data structures sanity check • Reconstruct dictionary indexes
© Ismail Oukid | | 25
Evaluation: Setup
Hardware-based SCM simulation:
• Intel Xeon E5 @2.60Ghz, 20MB L3 cache, 8 physical cores/socket
• Avoiding NUMA effects: benchmark run on a single socket
• Limitation: symmetric instead of asymmetric read/write latency
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
SCM (PMFS mount)
DRAM DRAM
Socket 0 Socket 1
CPU (8 cores) CPU (8 cores)
QPI Link
Memory Bus Memory Bus
© Ismail Oukid | | 26
Evaluation: Setup
TATP benchmark:
• Simple but realistic OLTP benchmark (20% write, 80% reads)
• Simulates a telecommunication application
• Initial population of 1 million subscribers (~2GB data size)
Shore-MT:
• A storage manager designed for scalability in multicores
• Run on Ramdisk (tempFS)
Restart time:
• Single, 4 column table
• SCM latency set to 200ns
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
© Ismail Oukid | | 27
Evaluation: Throughput
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
0
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Mio
. Tx/
s
# users
GetSubData Throughput (Read Only)
shore-read-shm
sofort-read-shm
sofort-read-200ns
sofort-read-300ns
sofort-read-700ns
Worst case
Upper bound
Realistic case
User contention. Solved with PLP.
© Ismail Oukid | | 28
Evaluation: Throughput
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Mio
. Tx/
s
# users
TATP Throughput (80% Read, 20% Write)
shore-shm
sofort-shm
sofort-200ns
sofort-300ns
sofort-700ns
SOFORT outperforms Shore-MT even in a high latency SCM environment
Realistic case
Upper bound
Worst case
~900K Tx/s
~250K Tx/s
~550K Tx/s
© Ismail Oukid | | 29
Evaluation: Restart Time
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23 SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
0
1
2
3
4
5
2 4 6 8 10
Re
star
t Ti
me
(s)
Mio. Distinct Values
Data Size Fixed
time-dbfixed-skiplist
time-dbfixed-hashmap0
2
4
6
8
10
50 100 150 200 250
Re
star
t Ti
me
(s)
Mio. Rows/Table
Dict. Size Fixed
time-dictfixed-skiplist
time-dictfixed-hashmap
Restart time is independent of instance size and depends only on dictionaries size
0
2
4
6
0 5 10 15 20 25 30 35 40 45 50
Mio
Tx/
s
Time (s)
SOFORT on SCM
0
2
4
6
8
0 5 10 15 20 25 30 35 40 45 50
Mio
Tx/
s
Time (s)
SOFORT on DISK
~2sec ~18sec
Restart time is independent of data size
Volatile structures drive
restart time
No persistence guarantees
SOFORT recovers nearly
instantly
© Ismail Oukid | | 30
Conclusion and Future Work
We showed that SCM can help getting rid of the transactional log and significantly improving database restart times
Latency study based on hardware simulation showed that SOFORT exhibits competitive performance even in a high latency SCM environment
Future Work:
• Extend SOFORT to support long transactions • Explore new recovery schemes • Design persistent index structures • Experiment with write-through caching policy
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery DaMoN‘14, June 23
© Prof. Dr.-Ing. Wolfgang Lehner |
SOFORT: A Hybrid SCM-DRAM Storage Engine for Fast Data Recovery
Ismail Oukid, Daniel Booss, Wolfgang Lehner, Peter Bumbulis, and Thomas Willhalm+
DaMoN 2014, Snowbird, Utah, USA, June 23, 2014
Diese Zeile ersetzt man über: Einfügen > Kopf- und Fußzeile NICHT im Master
Thank You! Questions? Comments?
Ismail Oukid
Takeaway: SCM is a game changer in the database industry Make the change happen!