InnoDB: архитектура транзакционного хранилища...

download InnoDB: архитектура транзакционного хранилища (Константин Осипов)

If you can't read please download the document

Transcript of InnoDB: архитектура транзакционного хранилища...

InnoDB Features and Latest Enhancements MySQL Users Conference 2006 April 26, Santa Clara

InnoDB:

Highload++, 2010 , [email protected]

Everybody knows Heikki.Introduce mysqlf

.

, DBA :InnoDB Database Files

InnoDB threads

InnoDB data structures and algorithms

InnoDB Database Files

ibdata files

System tablespace

internal
data
dictionary

MySQL Data Directory

InnoDB
tables

OR

innodb_file_per_table

.ibd files

.frm files

undo
logs

insert buffer

InnoDB Tablespaces

Extent

Segment

Extent

Extent

Extent

an extent = 64 pages

Extent

Trx id

Row

Field 1

Roll pointer

Field pointers

Field 2

Field n

Row

Page

Row

Row

Row

Row

Leaf node segment

Tablespace

Rollback segment

Non-leaf node segment

Row

Row

The following 32 pages are allocated individually (from the fragmented extent space); after that, full 64 page extents are allocatedUsing multiple tablespaces can be beneficial to users who want to move specific tables to separate physical disks or who wish to restore backups of single tables quickly without interrupting the use of the remaining InnoDB tables.

InnoDB Pages

A page consists of: a page header, a page trailer, and a page body (rows or other contents).

Page header

Page trailer

row offset array

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row

Row

Also overflow page pointers

Whether any columns are stored off-page depends on the page size and the total size of the row. When the row is too long to fit entirely within thepage of the clustered index, InnoDB will choose the longest columns for off-page storage until the row fits on the clustered index page.

InnoDB Rows

prefix(768B)

overflowpage

COMACT format

Record hdr Trx ID Roll ptr Fld ptrs overflow-page ptr .. Field values

overflowpage

DYNAMIC format

20 bytes

COMPACT mode always stores up to 768-byte prefix of such columns in the clustered index pageNew DYNAMIC mode stores long columns entirely off-page, with only a 20-byte prefix in the clustered index pageIf row does not fit in clustered index page, some long BLOB or VARCHAR column(s) may be stored on an overflow page

the REDUNDANT format is available to retain compatibility with older versions of MySQL

InnoDB Indexes - Primary

Data rows are stored in the B-tree leaf nodes of a clustered index

B-tree is organized by primary key or non-null unique key of table, if defined; else, an internal column with 6-byte ROW_ID is added.

xxx
-nnn

001
-
275

276
500

clustered(primary key)
index

501
-
630

631
-
768

769
-
800

801
-
949

950
-
xxx

001
500

801
nnn

500
800

PK values001 - nnn

Key values
501-630+ data for corresponding rows

Primary Index

If you do not define a PRIMARY KEY for your table, InnoDB uses the first UNIQUE index that has only NOT NULL columns as the clustered index. If there is no such index in the table, InnoDB internally generates a clustered index where the rows are ordered by the row ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB. When new records are inserted, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records.

InnoDB Tablespaces

A tablespace consists of multiple files and/or raw disk partitions. file_name:file_size[:autoextend[:max:max_file_size]]

A file/partition is a collection of segments.

A segment consists of fixed-length pages.

The page size is always 16KB in uncompressed tablespaces, and 1KB-16KB in compressed tablespaces (for both data and index).

If you specify the autoextend option for the last data file, InnoDB extends the data file if it runs out of free space in the tablespace. The increment is 8MB at a time by default. It can be modified by changing the innodb_autoextend_increment system variable. You can use raw disk partitions as data files in the shared tablespace.Remember that only the last data file in the innodb_data_file_path can be specified as auto-extending.

System Tablespace

Internal Data Dictionary

Undo

Insert Buffer

Doublewrite Buffer

MySQL Replication Info

Note: InnoDB always needs the shared tablespace because it puts its internal data dictionary and undo logs there.

To support recovery and unique features.

DATA

InnoDB Logging

Rollback segments

Log Buffer

Buffer Pool


redo
log

rollback

Log File
#1

Log File
#2

log thread

write thread

log files

ibdata files

Combination of physical (disk address) and logical (field content) logging -- Physiological LoggingLogging is used for durability at crash recovery

InnoDB keeps two logs, the redo log and the undo log.The redo log is for re-doing data changes that had not been written to disk when a crash occurred. The undo log is primarily for removing data changes that had been written to disk when a crash occurred, but should not have been written, because they were for uncommitted transactions. The undo log is inside the tablespace.

The undo log is primarily for removing data changes that had been written to disk when a crash occurred, but should not have been written, because they were for uncommitted transactions. The undo log is inside the tablespace. The "insert" section of the undo log is needed only for transaction rollback and can be discarded at COMMIT time. The "update/delete" section of the undo log is also useful for consistent reads, and can be discarded when InnoDB has ended all transactions that might need the undo log records to reconstruct earlier versions of rows. An undo log record's contents are:

Primary Key Value (not a page number or physical address), Old Transaction ID (of the transaction that updated the row), and the changes (only old values).

COMMIT will write the contents of the log buffer to disk, and put undo log records in a history list. ROLLBACK will delete undo log records that are no longer needed. PURGE (an internal operation that occurs outside user control) will no-longer-necessary undo log records and, for data records that have been marked for deletion and are no longer necessary for consistent read, will remove the records.

InnoDB Redo Log

Redo log structure:

Space id PageNo OpCode Data

end of log

min LSN

start of log

last checkpoint

There is one redo log for the entire workspace, it contains multiple files, it is circular.

The file header includes the last successful checkpoint. A redo log record's contents are: Space id, Page Number (4 bytes = page number within tablespace), Offset of change within page (2 bytes), Log Record Type (insert, update, delete, "fill space with blanks", etc.), and the changes on that page (only after images, not before images).

Redo Logging

The redo log remembers EVERY operation on any page in the database

Redo log record format:

An example of a redo log record:= V5.1

Use innodb_locks_unsafe_for_binlog to remove gap locking in MySQL-5.0 and earlier

D

DDeadlock Detection & Rollback

InnoDB automatically detects deadlocks if it detects a cycle in waits-for graph of transactions

ABC

waits-for graphGiven a deadlock, InnoDB chooses the transaction that modified the fewest rows as the victim, and rolls it back

Note: InnoDB cannot detect deadlocks that span MySQL storage engines

Set innodb_lock_wait_timeout
in my.cnf, to break deadlocks
via timeout (default 50 sec)

1

1