The 14th Annual Smithsonian Photo Contest – Featured Entries (2)
Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An...
Transcript of Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An...
![Page 1: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/1.jpg)
计算机学院
数据库系统原理 9
Storage and Index 3
B+ Tree
2019.11
杨宁
1/73
![Page 2: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/2.jpg)
Reminder on Heap Files
• Two access APIs:– fetch by recordId (pageId, slotId)– scan (starting from some page)
Page 1 Page 2
Page 3 Page 4
Page 5 Page 6
Header Page
DIRECTORY …
![Page 3: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/3.jpg)
Wouldn’t it be nice…• …if we could look things up by value?
• Toward a Declarative access API
• But … efficiency?“If you don't find it in the index, look very carefully through the entire catalog. ”—Sears, Roebuck, and Co., Consumers' Guide, 1897
![Page 4: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/4.jpg)
We’ve seen this before
• Data structures … in RAM:
– Search trees (Binary, AVL, Red-Black, …)
– Hash tables
• Needed: disk-based data structures
– “paginated”: made up of disk pages!
![Page 5: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/5.jpg)
IndexAn index is data structure that enables fast lookup and modification of data entries by search key
• Lookup: may support many different operations– Equality, 1-d range, 2-d region, …
• Search Key: any subset of columns in the relation– Do not need to be unique
• —e.g. (firstname) or (firstname, lastname)
![Page 6: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/6.jpg)
Index Part 2An index is data structure that enables fast lookup and modification of data entries by search key
• Data Entries: items stored in the index– Assume for today: a pair (k, recordId) …
• Pointers to records in Heap Files!• Easy to generalize later
• Modification: want to support fast insert and delete
Many Types of indexes exist: B+-Tree, Hash, R-Tree, GiST, ...
![Page 7: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/7.jpg)
![Page 8: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/8.jpg)
Simple Idea?
• Step 1: Sort heap file & leave some space
– Pages physically stored in logical order (sequential access)– Do we need “next” pointers to link pages?
• No. Pages are physically sorted in logical order
• Step 2: Build the index data structure over this… – Why not just use binary search in this heap file?
• Fan-out of 2 à deep tree à lots of I/Os• Examine entire records just to read key during search
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
3, 4, 5 1, 2, 7 8, 6, 9 10, _, _Input Heap
File
![Page 9: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/9.jpg)
Build a high fan-out search tree
• Start simple: Sorted (key, page id) file– No record data– Binary search in the key file. Better!– Forgot: Need to break across pages!
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
![Page 10: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/10.jpg)
Build a high fan-out search tree
• Start simple: Sorted (key, page id) file– No record data– Binary search in the key file. Better!– Forgot: Need to break across pages!
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
![Page 11: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/11.jpg)
Build a high fan-out search tree Part 2
• Start simple: Sorted (key, page id) file– No record data– Binary search in the key file. Better!– Complexity?
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
![Page 12: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/12.jpg)
Build a high fan-out search tree Part 3
• Start simple: Sorted (key, page id) file– No record data– Binary search in the key file. Better!– Complexity: Still binary search, just a constant factor smaller input
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
![Page 13: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/13.jpg)
Build a high fan-out search tree Part 4
• Recursively “index” key file• Key Invariant:
– Node […, (KL, PL), (KR, PR), ... ] à All tuples in range KL <= K < KR are in tree PL
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
1 7
![Page 14: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/14.jpg)
Search a high fan-out search tree
• Searching for 5?– Binary Search each node (page) starting at root– Follow pointers to next level of search tree
• Complexity? O(logF(#Pages))
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
1 7
![Page 15: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/15.jpg)
Left Key Optimization?
• Optimization– Do we need the left most key?
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
1 3 5 7 9
1 7
1 7
1
![Page 16: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/16.jpg)
Build a high fan-out search tree
• Disk Layout? All in a single file, Data Pages first.
1, 2, _ 3, 4, _ 9,10,_Data Pages
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
3 5 9
7
IndexPages
![Page 17: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/17.jpg)
Status Check
• Some design goals:– Fast sequential scan?– High Fan-out?– Support insertion? …
1, 2, _ 3, 4, _ 9,10,_Data Pages
Indexed File
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
3 5 9
7
IndexPagesISAM
Indexed Sequential Access Method
(Early IBM Indexing Technology)
![Page 18: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/18.jpg)
Insert 11, Before
3, 4, _ 9,10,_Data Pages
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, _
3 5 9
7
1, 2, _
IndexPages
![Page 19: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/19.jpg)
Insert 11, After
3, 4, _ 9,10,11Data Pages
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, 11
3 5 9
7
1, 2, _
IndexPages
• Find location• Place in data page
– Re-sort page …
![Page 20: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/20.jpg)
Insert 12?
• Find location• Place in data page• Add overflow page if necessary …
OverflowPages
12, _, _
1, 2, _ 3, 4, _ 5, 6, _ 7, 8, _ 9, 10, 11
3 5 9
7
12, _, _
1, 2, _ 3, 4, _ 9,10,11Data PagesIndex
Pages
![Page 21: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/21.jpg)
Recap: ISAM
• Data entries in sorted heap file
• High fan-out static tree index
• Fast search + good locality– Assuming nothing changes
• Insert into overflow pages
![Page 22: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/22.jpg)
A Note of Caution
• ISAM is an old-fashioned idea– Introduced by IBM in 1960s– B+ trees are usually better, as we’ll see
• Though not always (ß we’ll come back to this)
• But, it’s a good place to start– Simpler than B+ tree, many of the same ideas
• Upshot– Don’t brag about ISAM on your resume– Do understand ISAM, and tradeoffs with B+ trees
![Page 23: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/23.jpg)
![Page 24: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/24.jpg)
B+-TREE
![Page 25: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/25.jpg)
Enter the B+ Tree
• Similar to ISAM– Same interior node structure
• <Key, Page Ptr> pairs with same key invariant– Same search routine as before
• Dynamic Tree Index– Always Balanced– Support efficient insertion & deletion
• Grows at root not leaves!
• “+”? B-tree that stores data entries in leaves only
![Page 26: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/26.jpg)
26/85
Example of a B+ Tree
• Occupancy Invariant– Each interior node is at least partially full:
• d <= #entries <= 2d• d: order of the tree (max fan-out = 2d + 1)
• Data pages at bottom need not be stored in logical order– Next and prev pointers
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root NodePage 1
Page 4 Page 6
Page 2 Page 3 Page 5 Page 7 Page 8 Page 9
![Page 27: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/27.jpg)
27/85
Sanity Check
What is the value of d? 2
What about the root?The root is special
Why not in sequential order?Data pages allocated dynamically
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root NodePage 1
Page 4 Page 6
Page 2 Page 3 Page 5 Page 7 Page 8 Page 9
![Page 28: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/28.jpg)
B+ Trees and Scale
• How big is a height 1 B+ tree– d = 2 à Fan-out?– Fan-out = 2d + 1 = 5– Height 1: 5 x 4 = 20 Records
Root Node
![Page 29: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/29.jpg)
B+ Trees and Scale Part 2
Root Node
• How big is a height 3 B+ tree• d = 2 à Fan-out?• Fan-out = 2d + 1 = 5• Height 3: 53 x 4= 500 Records
![Page 30: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/30.jpg)
B+ Trees in Practice• Typical order: 1600. Typical fill-factor: 67%.
– average fan-out = 2144– (assuming 128 Kbytes pages at 40Bytes per record)
• At typical capacities– Height 1: 21442 = 4,596,736 records– Height 2: 21443 = 9,855,401,984 records
![Page 31: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/31.jpg)
Searching the B+ Tree
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root NodePage 1
Page 4 Page 6
Page 2 Page 3 Page 5 Page 7 Page 8 Page 9
• Same as ISAM• Find key = 27
– Find split on each node (Binary Search)– Follow pointer to next node
![Page 32: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/32.jpg)
Searching the B+ Tree: Find 27
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root NodePage 1
Page 4 Page 6
Page 2 Page 3 Page 5 Page 7 Page 8 Page 9
• Same as ISAM• Find key = 27
– Find split on each node (Binary Search)– Follow pointer to next node
27*
![Page 33: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/33.jpg)
Searching the B+ Tree: Fetch Data
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Root NodePage 1
Page 4 Page 6
Page 2 Page 3 Page 5 Page 7 Page 8 Page 9
27*
(20, Tim) (7, Dan) (5, Kay) (3, Jim) (27, Joe) (34, Kit) (1, Kim) (42, Hal)
Page 1 Page 2 Page 3 Page 4
Page
Id,Re
cord
Id
![Page 34: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/34.jpg)
Inserting 25* into a B+ Tree Part 1
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
25*
• Find the correct leaf
![Page 35: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/35.jpg)
Inserting 25* into a B+ Tree Part 2
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 25* 33* 34* 38* 39*
Root Node
• Find the correct leaf• If there is room in the leaf just add the entry
![Page 36: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/36.jpg)
Inserting 25* into a B+ Tree Part 3
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 25* 29* 33* 34* 38* 39*
Root Node
• Find the correct leaf• If there is room in the leaf just add the entry
• Sort the leaf page by key
![Page 37: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/37.jpg)
Inserting 8* into a B+ Tree: Find Leaf
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
8*
• Find the correct leaf
![Page 38: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/38.jpg)
Inserting 8* into a B+ Tree: Insert
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
8*
• Find the correct leaf• Split leaf if there is not enough room
![Page 39: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/39.jpg)
Inserting 8* into a B+ Tree: Split Leaf
• Find the correct leaf– Split leaf if there is not enough room– Redistribute entries evenly
13 17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
8*
![Page 40: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/40.jpg)
5* 7* 8*
Inserting 8* into a B+ Tree: Split Leaf, cont
13 17 24 30
5* 7* 8*
2* 3* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
2* 3*
• Find the correct leaf• Split leaf if there is not enough room• Redistribute entries evenly• Fix next/prev pointers
![Page 41: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/41.jpg)
Inserting 8* into a B+ Tree: Fix Pointers
13 17 24 30
14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
2* 3* 5* 7* 8*
• Find the correct leaf• Split leaf if there is not enough room• Redistribute entries evenly• Fix next/prev pointers
![Page 42: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/42.jpg)
Inserting 8* into a B+ Tree: Mid-Flight
13 17 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
• Something is still wrong!
I am an orphan!
![Page 43: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/43.jpg)
Inserting 8* into a B+ Tree: Copy Middle Key
13 17 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
• Copy up from leaf the middle key• No room in parent? Recursively split index nodes
5
![Page 44: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/44.jpg)
13
Inserting 8* into a B+ Tree: Split Parent, Part 1
17 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
5
• Copy up from leaf the middle key• No room in parent? Recursively split index nodes
• Redistribute the rightmost d keys
302417
![Page 45: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/45.jpg)
Inserting 8* into a B+ Tree: Split Parent, Part 2
13
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
• Copy up from leaf the middle key• No room in parent? Recursively split index nodes
• Redistribute the rightmost d keys
5 302417
![Page 46: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/46.jpg)
Inserting 8* into a B+ Tree: Root Grows Up
13
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
• Push up from interior node the middle key• Now the last key on left
• No room in parent? Recursively split index nodes• Redistribute the rightmost d keys
5 17 24 30
![Page 47: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/47.jpg)
13
Inserting 8* into a B+ Tree: Root Grows Up, Pt 2
• Recursively split index nodes– Redistribute right d keys– Push up middle key
17
24
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node171717
5 30
![Page 48: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/48.jpg)
Inserting 8* into a B+ Tree: Root Grows Up, Pt 3
• Recursively split index nodes– Redistribute right d keys– Push up middle key
17
13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
5
![Page 49: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/49.jpg)
Copy up vs Push up!
• Notice:– The leaf entry (5) was copied up– The index entry (17) was pushed up
17
13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
5
![Page 50: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/50.jpg)
Inserting 8* into a B+ Tree: Final
• Check invariants• Key Invariant:• Node[…, (KL, PL), …] è
KL<= K for all K in PL Sub-tree• Occupancy Invariant:• d <= # entries <= 2d
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root Node
![Page 51: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/51.jpg)
![Page 52: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/52.jpg)
B+ Tree Insert: Algorithm Sketch1. Find the correct leaf L.
2. Put data entry onto L.– If L has enough space, done!– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, copy up middle key• Insert index entry pointing to L2 into parent of L.
![Page 53: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/53.jpg)
B+ Tree Insert: Algorithm Sketch Part 2
• Step 2 can happen recursively– To split index node, redistribute entries evenly, but push up middle
key. (Contrast with leaf splits)
• Splits “grow” tree; root split increases height.– Tree growth: gets wider or one level taller at top.
![Page 54: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/54.jpg)
Before and After Observations
• Notice that the root was split to increase the height– Grow from the root not the leaves– All paths from root to leaves are equal lengths
• Does the occupancy invariant hold?– Yes! All nodes (except root) are at least half full– Proof?
After17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 24* 27* 29* 33* 34* 38* 39*
Root NodeBefore
![Page 55: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/55.jpg)
Splitting a Leaf
• Start with full leaf (2d) entries (let d = 2)– Add a 2d + 1 entry (8*)
• Split into leaves with (d, d+1) entries– Copy key up to parent
• Why copy key and not push key up to parent?
d entries d+1 entries
2* 3* 4* 5* 8*
2* 3* 5* 7* 8*
5
![Page 56: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/56.jpg)
Splitting an Inner Node• Start with full interior node (2d) entries: (let d = 2)
– Add a 2d + 1 entry
• Split into nodes with (d, d+1) entries– Push key up to parent
d entries d+1 entries
13 17 24 30
5 13 24 30
5
17
![Page 57: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/57.jpg)
Splitting an Inner Node Pt 2• Start with full interior node (2d) entries: (let d = 2)
– Add a 2d + 1 entry
• Split into nodes with (d, d) entries– Push key up to parent
13 17 24 30 5
d entries d entries
5 13 24 30
17
![Page 58: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/58.jpg)
Splitting an Inner Node Pt 3• Start with full interior node (2d) entries: (let d = 2)
– Add a 2d + 1 entry
• Split into nodes with (d, d) entries– Push key up to parent
13 17 24 30 5
d entries d entries
5 13 24 30
17
Why push not copy?• Routing key not needed in child
Occupancy invariant holds after split
![Page 59: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/59.jpg)
Nice Animation Online
• Great animation online of B+ Trees
• One small difference to note– Upon deletion of leftmost value in a node, it
updates the parent index entry– Incurs unnecessary extra writes
![Page 60: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/60.jpg)
B+-TREE DELETION
![Page 61: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/61.jpg)
We will skip deletion
• In practice, occupancy invariant often not enforced
• Just delete leaf entries and leave space
• If new inserts come, great – This is common
• If page becomes completely empty, can delete– Parent may become underfull– That’s OK too
– Guarantees still attractive: logF(max size of tree)
![Page 62: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/62.jpg)
BULK LOADING B+-TREES
![Page 63: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/63.jpg)
Bulk Loading of B+ Tree Part 1• Suppose we want to build an index on a large table• Would it be efficient to just call insert repeatedly
– No … Why not?– Random Order: CLZARNDXEKFWIUB. Order 2.– Try it: Interactive demo
![Page 64: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/64.jpg)
Bulk Loading of B+ Tree Part 2
L
D F R W
A* B* C* D* E* F* I* K* L* N* R* U* W* X* Z*
• Constantly need to search from root• Leaves and internal nodes mostly half-empty• Modifying random pages:
poor cache efficiency
![Page 65: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/65.jpg)
Bulk Loading of B+ Tree Part 2• Constantly need to search from leaf
• Leaves and nodes are mostly half full
• Modifying random pages -> poor cache efficiency
L
D EA B C F I K L N R U W X Z
D F R W
![Page 66: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/66.jpg)
16
Smarter Bulk Loading a B+ Tree
4 7 10 134 7 10 13
1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*
• Sort the input records by key:• 1*, 2*, 3*, 4*, …• We’ll learn a good disk-based sort algorithm soon!
• Fill leaf pages to some fill factor (e.g. ¾)• Updating parent until full
![Page 67: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/67.jpg)
Smarter Bulk Loading a B+ Tree Part 2
4 7 10 134 7 10 13
4 7 10 1313
4 7 10
1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*
16
• Sort the input records by key:• 1*, 2*, 3*, 4*, …
• Fill leaf pages to some fill factor (e.g. ¾)• Update parent until full• Then split parent and copy to sibling to achieve fill factor
![Page 68: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/68.jpg)
Smarter Bulk Loading a B+ Tree Part 3
• Lower left part of the tree is never touched again• Occupancy invariant maintained
13
4 7 10
1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*
16
Never Touched Again
![Page 69: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/69.jpg)
Smarter Bulk Loading a B+ Tree Part 4
• Sort the input records by key:• 1*, 2*, 3*, 4*, …
• Fill leaf pages to some fill factor (e.g. ¾)• Update parent until full• Then split parent
4 7 10
1* 2* 3* 4* 5* 6* 7* 8* 9* 10* 11* 12* 13* 14* 15* 16* 17* 18*
16 19 22 25
19* 20* 21* 22* 23* 24* 25* 26* 27*
13
![Page 70: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/70.jpg)
Summary of Bulk Loading• Option 1: Multiple inserts
– Slow– Does not give sequential storage of leaves
• Option 2: Bulk Loading– Fewer I/Os during build. (Why?)– Leaves will be stored sequentially (and linked, of course)– Can control “fill factor” on pages.
![Page 71: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/71.jpg)
Summary
• ISAM is a static structure
– Only leaf pages modified; overflow pages needed– Overflow chains can degrade performance unless size of data set and data distribution stay constant
• B+ Tree is a dynamic structure– Inserts/deletes leave tree height-balanced; logFN cost– High fanout (F) means depth rarely more than 3 or 4.– Almost always better than maintaining a sorted file.– Typically, 67% occupancy on average– Usually preferable to ISAM; adjusts to growth gracefully.
![Page 72: Storage and Index 3 B+ Treeyneversky.sinaapp.com/Courses/DB/DBSP_2019_Fall_9... · Index Part 2 An indexis data structure that enables fast lookupand modification of data entries](https://reader033.fdocument.pub/reader033/viewer/2022060415/5f134f003626e25f9055c22a/html5/thumbnails/72.jpg)
Summary Cont.• Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large
data set.• B+ tree widely used because of its versatility
– One of the most optimized components of a DBMS.– Concurrent Updates– In-memory efficiency