Hacking PostgreSQL. Разделяемая память и блокировки.
-
Upload
anastasia-lubennikova -
Category
Software
-
view
880 -
download
6
Transcript of Hacking PostgreSQL. Разделяемая память и блокировки.
4
Разделяемая память
src/include/storage
src/backend/storage/ipc
dsm.c procsignal.c shmqueue.c sinval.c dsm_impl.c
ipc.c procarray.c hm_mq.c sinvaladt.c ipci.c
pmsignal.c shmem.c shm_toc.c standby.c
src/backend/storage/buffer
buf_init.c bufmgr.c buf_table.c freelist.c localbuf.c
src/backend/storage/lmgr
deadlock.c lmgr.c lock.c lwlock.c s_lock.c
spin.c lwlocknames.c predicate.c proc.c
src/backend/port
5
Создание разделяемой памяти
src/backend/storage/ipc/ipci.cCreateSharedMemoryAndSemaphores()
PGSharedMemoryCreate()PGReserveSemaphores()
CreateLWLocks();InitShmemIndex();
InitBufferPool();InitLocks();
CreateSharedProcArray();...
/* * Now give loadable modules a chance to set up their shmem allocations */if (shmem_startup_hook)shmem_startup_hook();
6
System V, POSIX, mmap()
src/include/storage/pg_shmem.hsrc/backend/port/sysv_shmem.c
PGSharedMemoryCreate()CreateAnonymousSegment(&size);InternalIpcMemoryCreate(NextShmemSegID, sysvsize);
typedef struct PGShmemHeader{
int32 magic; /* magic # to identify Postgres segments */#define PGShmemMagic 679834894pid_t creatorPID; /* PID of creating process */Size totalsize; /* total size of segment */Size freeoffset; /* offset to first free space */dsm_handle dsm_control; /* ID of dynamic shared memory control seg */void *index; /* pointer to ShmemIndex table */#ifndef WIN32 /* Windows doesn't have useful inode#s */dev_t device; /* device data directory is on */ino_t inode; /* inode number of data directory */#endif
} PGShmemHeader;
7
Структуры данных
src/include/storage/shmem.hsrc/backend/storage/ipc/shmem.csrc/backend/storage/ipc/shmqueue.c
ShmemInitStruct()ShmemInitHash()SHMQueueInit()
8
ShmemIndex
src/backend/storage/ipc/shmem.c
/* primary index hashtable for shmem */static HTAB *ShmemIndex = NULL;
/* this is a hash bucket in the shmem index table */typedef struct{
char key[SHMEM_INDEX_KEYSIZE]; /* string name */void *location; /* location in shared mem */Size size; /* # bytes allocated for the structure */
} ShmemIndexEnt;
10
Shared buffers
src/include/storage/buf_internals.hsrc/backend/storage/buffer/buf_init.c
InitBufferPool()
11
pg_buffercache
CREATE EXTENSION pg_buffercache;
select* from pg_buffercache limit 1;
bufferid | 1relfilenode | 1262reltablespace | 1664reldatabase | 0relforknumber | 0relblocknumber | 0isdirty | fusagecount | 5pinning_backends | 0
12
pg_buffercache
pgbench -i -s 200 $DBNAMEpgbench -S -c 8 -t 25000 $DBNAME
shared_buffers | size ----------------+-------- 16384 | 128 MB
database_size --------------- 2998 MB
13
pg_buffercache
SELECT c.relname, count(*) AS buffersFROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode=c.relfilenode INNER JOIN pg_database d ON (b.reldatabase=d.oid AND d.datname=current_database())GROUP BY c.relnameORDER BY 2 DESCLIMIT 10;
relname | buffers ---------------------------------+--------- pgbench_accounts | 8256 pgbench_accounts_pkey | 7995 pg_amop | 5 pg_operator_oprname_l_r_n_index | 4 pg_operator | 4 pg_index | 3 pg_amop_opr_fam_index | 3 pg_amop_fam_strat_index | 3 pg_amproc_fam_proc_index | 3 pg_opclass | 3
14
pg_buffercache
SELECT usagecount,count(*),isdirtyFROM pg_buffercache GROUP BY isdirty,usagecount ORDER BY isdirty,usagecount;
usagecount | count | isdirty ------------+-------+--------- 0 | 8063 | f 1 | 7819 | f 2 | 207 | f 3 | 18 | f 4 | 29 | f 5 | 245 | f 2 | 2 | t 5 | 1 | t
15
pg_buffercache
SELECT c.relname, pg_size_pretty(count(*) * 8192::bigint) as buffered, round(100.0 * count(*) / (SELECT setting FROM pg_settings WHERE name='\''shared_buffers'\'')::integer,1) AS buffers_percent, round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relationFROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())GROUP BY c.oid,c.relnameORDER BY 3 DESCLIMIT 10;
relname | buffered | buffers_percent | percent_of_relation -----------------------+------------+-----------------+--------------------- pgbench_accounts | 64 MB | 50.3 | 2.5 pgbench_accounts_pkey | 62 MB | 48.7 | 14.6
16
pg_buffercache
SELECT c.relname, count(*) AS buffers, usagecountFROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())WHERE c.relname LIKE '\''pgbench%'\''GROUP BY c.relname,usagecountORDER BY c.relname,usagecount;
relname | buffers | usagecount -----------------------+---------+------------ pgbench_accounts | 4235 | 0 pgbench_accounts | 3980 | 1 pgbench_accounts | 28 | 2 pgbench_accounts_pkey | 3833 | 0 pgbench_accounts_pkey | 3788 | 1 pgbench_accounts_pkey | 161 | 2 pgbench_accounts_pkey | 3 | 3 pgbench_accounts_pkey | 1 | 4 pgbench_accounts_pkey | 193 | 5
17
pg_buffercache
shared_buffers | size ----------------+-------- 16384 | 128 MB
relname | buffered | buffers_percent | percent_of_relation -----------------------+------------+-----------------+--------------------- pgbench_accounts | 64 MB | 50.3 | 2.5 pgbench_accounts_pkey | 62 MB | 48.7 | 14.6
shared_buffers | size ----------------+--------- 262144 | 2048 MB
relname | buffered | buffers_percent | percent_of_relation -----------------------+------------+-----------------+--------------------- pgbench_accounts | 1180 MB | 57.6 | 46.1 pgbench_accounts_pkey | 418 MB | 20.4 | 97.5
21
Блокировки
● Блокировки на уровне объектов БД (heavy-weight locks)– pg_locks, pg_stat_activity– определение дедлоков– освобождаются в конце транзакции
● LWLocks (lightweight locks)– блокировки на чтение/запись (shared/exclusive) для структур в
разделяемой памяти– нет определения дедлоков– блокируют query_cancel() и die()
● Spinlocks– в основном инфраструктура для LWLocks– TAS (test-and-set) инструкции процессора– блокируют query_cancel() и die()
22
Блокировки
src/backend/storage/lmgr/README
● Блокировки на уровне объектов БД (heavy-weight locks)– src/backend/storage/lmgr/lmgr.c– src/backend/storage/lmgr/lock.c
● LWLocks (lightweight locks)– src/backend/storage/lmgr/lwlock.c
● Spinlocks– src/backend/storage/lmgr/spin.c – эмуляция через семафоры– src/backend/storage/lmgr/s_lock.c
23
Улучшения в 9.6
● Allow Pin/UnpinBuffer to operate in a lockfree manner
– PostgreSQL на многоядерных серверах Power 8
● Provide much better wait information in pg_stat_activity
– Мониторинг ожиданий в PostgreSQL
– Интерфейс ожиданий в PostgreSQL
– Monitoring Wait Events in PostgreSQL 9.6. pg_wait_sampling
– pg_stat_* documentation
24
Обычная стратегия замещения
src/backend/storage/buffer/freelist.c
Основано на презентации “Illustrated guide to the buffer cache”, Selena Deckelmann
32
Background writer
● bgwriter_delay (integer)● bgwriter_lru_maxpages (integer)● bgwriter_lru_multiplier (floating point)● bgwriter_flush_after (int) 9.6
34
Специальная стратегия замещения
src/include/storage/bufmgr.h
● SeqScan
– 256 KB– Вытесняет dirty buffers в общее кольцо
● VACUUM
– 256 KB– Ожидает записи WAL на диск
● BulkWrite (COPY IN, CREATE TABLE AS SELECT)
– 16 MB– Ожидает записи WAL на диск
● Возможно стоит использовать BulkWrite стратегию для UPDATE и DELETE на больших таблицах
35
Улучшения в 9.6
● Allow to trigger kernel writeback after a configurable number of writes.
● Checkpoint sorting and balancing.● Extend relations multiple blocks at a time to improve scalability.
36
Проблемы: Hash-таблица
● Hash-таблица– Большие ключи
– Нельзя эффективно упорядочивать страницы
– Удаление таблиц дорогое (?)
Решение: Замена hash-таблицы на Tree of Radix tree.
37
Проблемы: clock sweep
● Clock sweep
– Каждый бэкенд выполняет clock sweep сам
Решение: передать эту работу отдельному бэкенд процессу.
● Clock sweep
– В худшем случае нужно прочитать Nbuffers*5 буферов
– Часто большинство буферов имеют usagecount=5 (это и есть худший случай)
● Буферы никогда не возвращаются во freelist
● bg_writer_delay – задается фиксированно
Решение: ???
39
Источники
● Monitoring PostgreSQL Buffer Cache Internals. 2010
● Illustrated buffer cache. 2010
● Setting up shared memory for PostgreSQL. 2013
● Inside PostgreSQL Shared Memory. 2015
● CheckPoint Internals of PostgreSQL and Oracle. 2013
● Improving Postgres' Buffer Manager. 2016