HandlerSocket plugin for MySQL Jun 29, 2010 DeNA Technology Seminar @ Yoyogi IT Platform Dept.,...
-
Upload
ella-bradford -
Category
Documents
-
view
214 -
download
0
Transcript of HandlerSocket plugin for MySQL Jun 29, 2010 DeNA Technology Seminar @ Yoyogi IT Platform Dept.,...
HandlerSocket plugin for HandlerSocket plugin for MySQLMySQL
Jun 29, 2010 Jun 29, 2010 DeNA Technology Seminar @ YoyogiDeNA Technology Seminar @ Yoyogi
IT Platform Dept., System Management DivisionIT Platform Dept., System Management DivisionDeNA Co.,Ltd.DeNA Co.,Ltd.Akira Higuchi <higuchi dot akira at dena dot jp>Akira Higuchi <higuchi dot akira at dena dot jp>
Who am I?Who am I?
Akira Higuchi, Ph.D. in science Akira Higuchi, Ph.D. in science IT Platform Dept., DeNA Co.,Ltd. IT Platform Dept., DeNA Co.,Ltd.
system-wide performance optimizationsystem-wide performance optimization middleware developmentmiddleware development
The creator of HandlerSocket pluginThe creator of HandlerSocket plugin Using GNU/Linux since 1993Using GNU/Linux since 1993
Fedora: yum install KoboDeluxeFedora: yum install KoboDeluxe Debian: apt-get install kobodeluxeDebian: apt-get install kobodeluxe
About HandlerSocket pluginAbout HandlerSocket plugin
What is HandlerSocket?What is HandlerSocket?
Non-SQL interface for MySQLNon-SQL interface for MySQL
What HandlerSocket aimsWhat HandlerSocket aims
Executes simple CRUD operations Executes simple CRUD operations fastfast Omit SQL parsingOmit SQL parsing Combine multiple requests on the server Combine multiple requests on the server
side side Allows SQL on the same databaseAllows SQL on the same database
Only simple operations can be faster Only simple operations can be faster Seamless migration from SQL queriesSeamless migration from SQL queries
HandlerSocket pluginHandlerSocket plugin
Offers a direct and non-SQL interface to MySQL Offers a direct and non-SQL interface to MySQL storage enginesstorage engines
Own TCP/IP listenerOwn TCP/IP listener Talks a text protocolTalks a text protocol There is a C++ and a Perl client librariesThere is a C++ and a Perl client libraries Only works with LinuxOnly works with Linux The source code is here:The source code is here:
https://github.com/ahiguti/HandlerSocket-Plugin-for-Myhttps://github.com/ahiguti/HandlerSocket-Plugin-for-MySQLSQL
More infos on the DeNA Tech BlogMore infos on the DeNA Tech Blog http://engineer.dena.jp/http://engineer.dena.jp/ (in Japanese) (in Japanese)
ConstructionConstruction
Handler Interface
Innodb MyISAM Other storage engines …
SQL Layer Handlersocket Plugin
Listener for libmysql
libmysql libhsclient
Applications
mysqld
client app
Other NoSQL interfaces to Other NoSQL interfaces to MySQLMySQL
mycachedmycached http://developer.cybozu.co.jp/kazuho/2009/http://developer.cybozu.co.jp/kazuho/2009/
08/mycached-memcac.html08/mycached-memcac.html Works with any storage enginesWorks with any storage engines Talks the memcached protocolTalks the memcached protocol
NDB APINDB API http://dev.mysql.com/doc/ndbapi/en/index.http://dev.mysql.com/doc/ndbapi/en/index.
htmlhtml Dedicated for the ndbcluster engineDedicated for the ndbcluster engine
PerformancePerformance
PerformancePerformance
241009
159407
60191
15771
0 50000 100000 150000 200000 250000 300000
1 column
50 columns
(requests/sec)
handlersocket
libmysql
Handlersocket executes simple read queries 4x Handlersocket executes simple read queries 4x faster than mysqld/libmysqlfaster than mysqld/libmysql
Very effective when many columns are retrievedVery effective when many columns are retrieved The reason is described laterThe reason is described later
Commands supported by Commands supported by HandlerSocket (for reading HandlerSocket (for reading
data)data) In pseudo-SQL...In pseudo-SQL...
SELECT f1, .. , fn FROM db.tableSELECT f1, .. , fn FROM db.table
WHERE k1, ... , km = v1, ... , vmWHERE k1, ... , km = v1, ... , vm
ORDER BY index_i LIMIT offset, limitORDER BY index_i LIMIT offset, limit (k1, ... , km) are the key fields (or a (k1, ... , km) are the key fields (or a
prefix) of the index_iprefix) of the index_i =, >=, >, <=, and < can be used for a =, >=, >, <=, and < can be used for a
comparatorcomparator
Commands supported by Commands supported by HandlerSocket (for modifying HandlerSocket (for modifying
data)data)
UPDATE, DELETE, and INSERTUPDATE, DELETE, and INSERT Does not support transactionsDoes not support transactions Modifications are recorded to the Modifications are recorded to the
binary log in the row-based formatbinary log in the row-based format Modifications are durableModifications are durable
Command exampleCommand example
create table db1.table1 (k int key, v char(20))create table db1.table1 (k int key, v char(20)) insert into db1.table1 values (234, 'foo'), (678, ‘bar’)insert into db1.table1 values (234, 'foo'), (678, ‘bar’)
$ telnet localhost 9998Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.P 0 db1 table1 PRIMARY k,v0 10 = 1 2340 2 234 foo0 = 1 6780 2 678 bar
opens the PK
find k = 234
find k = 678
Why fast?Why fast?
No SQL parsingNo SQL parsinglow CPU usagelow CPU usage
Executes multiple requests in bulkExecutes multiple requests in bulklow CPU/Disk usagelow CPU/Disk usage
Own client/server protocolOwn client/server protocolsmall network transmission sizesmall network transmission size
Eliminating CPU Eliminating CPU consumptionconsumption
oprofile results – oprofile results – libmysql/mysqldlibmysql/mysqld
Executes “SELECT v from table where k = ?” many Executes “SELECT v from table where k = ?” many timestimes
samples| %|------------------ 9669940 53.1574 mysqld 4438098 24.3970 vmlinux 1835976 10.0927 libpthread-2.5.so 1680656 9.2389 libc-2.5.so 397970 2.1877 e1000e 89136 0.4900 oprofiled 42881 0.2357 oprofile
oprofile results – oprofile results – libmysql/mysqldlibmysql/mysqld
samples % symbol name748022 7.7355 MYSQLparse(void*)219702 2.2720 my_pthread_fastmutex_lock205606 2.1262 make_join_statistics(JOIN*, TABLE_LIST*,198234 2.0500 btr_search_guess_on_hash180731 1.8690 JOIN::optimize()177120 1.8317 row_search_for_mysql171185 1.7703 lex_one_token(void*, void*)162683 1.6824 alloc_root131823 1.3632 read_view_open_now122795 1.2699 mysql_select(THD*, Item***, TABLE_LIST*,100276 1.0370 open_table(THD*, TABLE_LIST*, st_mem_root*,99575 1.0297 mem_pool_fill_free_list96434 0.9973 build_template(row_prebuilt_struct*, THD*,86349 0.8930 get_hash_symbol(char const*, unsigned int,
CPU usage inside mysqldCPU usage inside mysqld
oprofile results – oprofile results – libmysql/mysqldlibmysql/mysqld
samples % symbol name204393 4.6054 schedule118648 2.6734 tcp_sendmsg115832 2.6099 tcp_recvmsg106537 2.4005 tcp_v4_rcv103915 2.3414 tcp_ack103534 2.3328 system_call93864 2.1150 dev_queue_xmit86831 1.9565 __mod_timer85891 1.9353 tcp_rcv_established84083 1.8946 .text.task_rq_lock
CPU usage inside the Linux kernelCPU usage inside the Linux kernel
oprofile results – oprofile results – libmysql/mysqldlibmysql/mysqld
libmysql/mysqldlibmysql/mysqld Much CPU time spent in mysqldMuch CPU time spent in mysqld Parsing SQL is slowParsing SQL is slow schedule() is called frequentlyschedule() is called frequently
oprofile results – oprofile results – HandlerSocketHandlerSocket
samples| %|------------------ 1919039 51.0453 vmlinux 811998 21.5987 mysqld 421215 11.2041 libpthread-2.5.so 207166 5.5105 e1000e 191566 5.0955 handlersocket.so 188618 5.0171 libc-2.5.so 13622 0.3623 oprofiled 5707 0.1518 oprofile
CPU usage inside MySQL with HandlerSocketCPU usage inside MySQL with HandlerSocket
oprofile results – oprofile results – HandlerSocketHandlerSocket
samples % symbol name119684 14.7394 btr_search_guess_on_hash58202 7.1678 row_search_for_mysql46946 5.7815 mutex_delay38617 4.7558 my_pthread_fastmutex_lock37707 4.6437 buf_page_get_known_nowait36528 4.4985 rec_get_offsets_func34625 4.2642 build_template(row_prebuilt_struct*, THD*, TABLE*, 20024 2.4660 row_sel_store_mysql_rec19347 2.3826 btr_cur_search_to_nth_level16701 2.0568 row_sel_convert_mysql_key_to_innobase13343 1.6432 cmp_dtuple_rec_with_match11381 1.4016 ha_innobase::index_read(unsigned char*, 11176 1.3764 dict_index_copy_types10762 1.3254 mtr_memo_slot_release10734 1.3219 ha_innobase::init_table_handle_for_HANDLER()
CPU consumption in mysqldCPU consumption in mysqld
oprofile results – oprofile results – HandlerSocketHandlerSocket
samples % symbol name129038 6.7241 tcp_sendmsg80080 4.1729 tcp_v4_rcv69658 3.6298 dev_queue_xmit66171 3.4481 .text.skb_release_data63316 3.2994 __qdisc_run60279 3.1411 tcp_recvmsg59703 3.1111 ip_output58462 3.0464 .text.skb_release_head_state48876 2.5469 tcp_ack48733 2.5394 __alloc_skb45660 2.3793 ip_queue_xmit44671 2.3278 tcp_transmit_skb
CPU consumption in the Linux kernelCPU consumption in the Linux kernel
oprofile results – oprofile results – HandlerSocketHandlerSocket
HandlerSocketHandlerSocket Most CPU time is consumed in the kernelMost CPU time is consumed in the kernel schedule() is not called frequentlyschedule() is not called frequently Inside mysqld, innodb eats most CPU Inside mysqld, innodb eats most CPU
timetime
Executing multiple requests Executing multiple requests in bulkin bulk
Threading modelThreading model
mysqld:mysqld: Thread per connection (MySQL 5)Thread per connection (MySQL 5) Thread pooling (MySQL 6?)Thread pooling (MySQL 6?)
Threading modelThreading model
HandlerSocket:HandlerSocket: Small number of threadsSmall number of threads Many connections per threadMany connections per thread
Uses epoll()Uses epoll() Virtually unlimited number of concurrent Virtually unlimited number of concurrent
connectionsconnections Small memory footprintSmall memory footprint
HandlerSocket reader HandlerSocket reader threadthread
reads requests from many clients
locks the DB, gets a read view
executes many requests
unlocks the DB
returns responses to clients
locks/unlocks (1/#conns) times per request
handlersocket reader thread
HandlerSocket writer threadHandlerSocket writer thread
reads requests from many clients
locks the DB, begins a transaction
executes multiple requests
commits, and unlocks the DB
returns responses to clients
handlersocket writer thread
executes multiple ops in a single transaction
Write throughputWrite throughput
Condition:Condition: Durable writeDurable write
sync_binlog = 1sync_binlog = 1 innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1 innodb_support_xa = 1innodb_support_xa = 1
Write-back cache with BBU, or SSDWrite-back cache with BBU, or SSD Throughput:Throughput:
MySQL: up to 1000 qpsMySQL: up to 1000 qps HandlerSocket: up to 30000 qpsHandlerSocket: up to 30000 qps
How HandlerSocket locks How HandlerSocket locks tablestables
MyISAM:MyISAM: Shared-exclusive lockShared-exclusive lock
InnoDB:InnoDB: Reader threads don’t blockReader threads don’t block Only one writer thread can be executed at the Only one writer thread can be executed at the
same timesame time
HandlerSocket requests are deadlock-freeHandlerSocket requests are deadlock-free Only simple operations are supportedOnly simple operations are supported
Client/server protocolClient/server protocol
MySQL C/S protocolMySQL C/S protocol
write(3, "L\0\0\0\3select column0,column1,column2,column3,column4 from write(3, "L\0\0\0\3select column0,column1,column2,column3,column4 from db_1.table_1 where k=15", 80) = 80db_1.table_1 where k=15", 80) = 80
read(3, "\1\0\0\1\0056\0\0\2\3def\read(3, "\1\0\0\1\0056\0\0\2\3def\44db_1db_1\7\7table_1table_1\7\7table_1table_1\7\7column0column0\7\7column0column0\f\r\0<\\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\3\3def\0\0\0\375\200\0\0\0\0006\0\0\3\3def\44db_1db_1\7\7table_1table_1\7\7table_1table_1\7\7column1column1\7\7column1column1\f\r\0<\\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\4\3def\0\0\0\375\200\0\0\0\0006\0\0\4\3def\44db_1db_1\7\7table_1table_1\7\7table_1table_1\7\7column2column2\7\7column2column2\f\r\0<\\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\5\3def\0\0\0\375\200\0\0\0\0006\0\0\5\3def\44db_1db_1\7\7table_1table_1\7\7table_1table_1\7\7column3column3\7\7column3column3\f\r\0<\\f\r\0<\0\0\0\375\200\0\0\0\0006\0\0\6\3def\0\0\0\375\200\0\0\0\0006\0\0\6\3def\44db_1db_1\7\7table_1table_1\7\7table_1table_1\7\7column4column4\7\7column4column4\f\r\0<\\f\r\0<\0\0\0\375\200\0\0\0\0\5\0\0\7\376\0\0\"\0\n\0\0\0\375\200\0\0\0\0\5\0\0\7\376\0\0\"\0\n\0\0\10\0010\0\10\00100\001\00111\001\00122\001\00133\001\00144\5\0\0\t\376\0\0\"\0", 16384) = 327\5\0\0\t\376\0\0\"\0", 16384) = 327
when the above query is executed...
SELECT column0, column1, column2, column3, column4 FROM db_1.table_1 where k = 15
HandlerSocket C/S protocolHandlerSocket C/S protocol
write(3, "1\t=\t1\t15\n", 9) = 9write(3, "1\t=\t1\t15\n", 9) = 9read(3, "0\t5\t0\t1\t2\t3\t4\n", 8192) = 14read(3, "0\t5\t0\t1\t2\t3\t4\n", 8192) = 14
when an equivalent query is executed using handlersocket...
libmysqllibmysql handlersockethandlersocket
requestrequest 80 bytes80 bytes 9 bytes9 bytes
responseresponse 327 bytes327 bytes 14 bytes14 bytes
MySQL C/S protocolMySQL C/S protocol The strace result shows that MySQL C/S The strace result shows that MySQL C/S
protocol is verboseprotocol is verbose Result-set metadataResult-set metadata
http://forge.mysql.com/wiki/MySQL_Internals_ClientServehttp://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#Field_Packetr_Protocol#Field_Packet
Result-set metadata become very large if Result-set metadata become very large if a result-set has many columnsa result-set has many columns
Neither a HANDLER statement nor a Neither a HANDLER statement nor a server-side prepared statement does not server-side prepared statement does not help to avoid this problemhelp to avoid this problem
Client librariesClient libraries
libhsclientlibhsclient
Client library for C++Client library for C++
Net::HandlerSocketNet::HandlerSocket
Client library for PerlClient library for Perl Invokes libhsclient via XSInvokes libhsclient via XS
my $cli = new Net::HandlerSocket({host => ‘localhost’, port => 9999});
$cli->open_index(1, ‘db1’, ‘table1’, ‘PRIMARY’, ‘k,v’);my $res = $cli->exec_multi([
[ 1, ‘=‘, [ ’33’ ], 1, 0 ],[ 1, ‘=‘, [ ’44’ ], 1, 0, ‘U’, [ ’44’, ‘hoge’ ] ],[ 1, ‘>=‘, [ ’55’ ], 10, 20 ],
]);
Configuration hintsConfiguration hints
HandlerSocket configuration HandlerSocket configuration optionsoptions
handlersocket_threads = 16handlersocket_threads = 16 Number of reader threadsNumber of reader threads Recommended value is the number of logical Recommended value is the number of logical
CPUCPU handlersocket_thread_wr = 1handlersocket_thread_wr = 1
Number of writer threadsNumber of writer threads Recommended value is ... 1Recommended value is ... 1
handlersocket_port = 9998handlersocket_port = 9998 Listening port for reader requestsListening port for reader requests
handlersocket_port_wr = 9999handlersocket_port_wr = 9999 Listening port for writer requestsListening port for writer requests
Other configuration optionsOther configuration options
innodb_buffer_pool_sizeinnodb_buffer_pool_size As large as possibleAs large as possible
innodb_log_file_size, innodb_log_file_size, innodb_log_files_in_groupinnodb_log_files_in_group As large as possibleAs large as possible
innodb_thread_concurrency = 0innodb_thread_concurrency = 0 open_files_limit = 65535open_files_limit = 65535
Number of file descriptors mysqld can openNumber of file descriptors mysqld can open HandlerSocket can handle up to 65000 HandlerSocket can handle up to 65000
concurrent connectionsconcurrent connections
Other configuration optionsOther configuration options
innodb_adaptive_hash_index = 1innodb_adaptive_hash_index = 1 Adaptive has index is fast, but consume Adaptive has index is fast, but consume
memorymemory
Options related to durabilityOptions related to durability
sync_binlog = 1sync_binlog = 1 innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1 innodb_support_xa = 1innodb_support_xa = 1
Benchmark resultsBenchmark results
BenchmarkBenchmark Server:Server:
Core2Quad Q6600Core2Quad Q6600 CentOS 5.4CentOS 5.4 Single EXPI9301CT(e1000e)Single EXPI9301CT(e1000e) Single Intel X25-E (write-back cache disabled)Single Intel X25-E (write-back cache disabled)
Schema:Schema: CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine = CREATE TABLE table1 (k varchar(32) KEY, v varchar(32)) engine =
INNODB;INNODB; Read benchmark:Read benchmark:
10000000 records10000000 records SELECT v from table1 where k = ?SELECT v from table1 where k = ? Random accessRandom access
Write benchmark:Write benchmark: 10000000 records10000000 records UPDATE table SET v = ? where k = ?UPDATE table SET v = ? where k = ? Random accessRandom access Durable writeDurable write
sync_binlog = 1sync_binlog = 1 innodb_flush_log_at_trx_commit = 1innodb_flush_log_at_trx_commit = 1 innodb_support_xa = 1innodb_support_xa = 1
Throughput (reads)Throughput (reads)
0
50000
100000
150000
200000
250000
3000001 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
# of concurrent connections
quer
ies
per se
c
handlersocket read
mysql read
handlersocket write
mysql write
Throughput (writes)Throughput (writes)
1
10
100
1000
10000
1000001 2 4 8
16
32
64
128
256
512
1024
2048
4096
8192
16384
# of concurrent connections
quer
ies
per
sec
handlersocket write
mysql write
Maximum response timeMaximum response time
0
10
20
30
40
50
60
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
# of concurrent connections
max
res
pons
e tim
e(se
c)
handlersocket read
mysql read
handlersocket write
mysql write
Average response timeAverage response time
0
1
2
3
4
5
6
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
# of concurrent connections
aver
age
resp
onse
tim
e(se
c)
handlersocket read
mysql read
handlersocket write
mysql write
Issues and future plansIssues and future plans
IssuesIssues
Difficult to buildDifficult to build Requires the source of mysqlRequires the source of mysql MySQL binary compatibility?MySQL binary compatibility?
Future plansFuture plans
‘‘where’ clausewhere’ clause Atomic read-modify-write operationsAtomic read-modify-write operations SQL support?SQL support? More language bindingsMore language bindings