Concurrency model for mysql data [email protected] 2012

50
Concurrency Model for huge MySQL data Mu-Fan Teng(@ryudoawaru ) @ RubyConf Taiwan 2012 12年12月8星期六

description

Video: http://www.youtube.com/watch?v=3Oi_IjKDZtg

Transcript of Concurrency model for mysql data [email protected] 2012

Page 1: Concurrency model for mysql data processing@rubyconf.tw 2012

Concurrency Model for huge MySQL data

Mu-Fan Teng(@ryudoawaru)@ RubyConf Taiwan 2012

12年12月8⽇日星期六

Page 2: Concurrency model for mysql data processing@rubyconf.tw 2012

緣起

Background

12年12月8⽇日星期六

Page 3: Concurrency model for mysql data processing@rubyconf.tw 2012

Legacy environment

• A Mysql Database with 2.3gb data with Big5 charset and ISO-8859-1 encoding.

• The biggest table in DB is 1.5gb.

12年12月8⽇日星期六

Page 4: Concurrency model for mysql data processing@rubyconf.tw 2012

The purpose

12年12月8⽇日星期六

Page 5: Concurrency model for mysql data processing@rubyconf.tw 2012

Transcoding to UTF8

12年12月8⽇日星期六

Page 6: Concurrency model for mysql data processing@rubyconf.tw 2012

Try

12年12月8⽇日星期六

Page 7: Concurrency model for mysql data processing@rubyconf.tw 2012

Work Flow1. mysqldump with -default-character-

set=latin1 parameter to generate SQL file.

2. Transcoding SQL file with tool like iconv/bsdconv.

3. Edit transcoded SQL file to avoid 「slash」 problem.

4. Restore SQL file to new DB.

12年12月8⽇日星期六

Page 8: Concurrency model for mysql data processing@rubyconf.tw 2012

Failed!

12年12月8⽇日星期六

Page 9: Concurrency model for mysql data processing@rubyconf.tw 2012

Cause

• Too big size for most text editor.

• Many mis-encoding text.

12年12月8⽇日星期六

Page 10: Concurrency model for mysql data processing@rubyconf.tw 2012

Let’s reinvent the wheel!

12年12月8⽇日星期六

Page 11: Concurrency model for mysql data processing@rubyconf.tw 2012

The new work flow

• Connect DB

• Transcode

• Output db rows to SQL insert statement.

• Write SQL file

12年12月8⽇日星期六

Page 12: Concurrency model for mysql data processing@rubyconf.tw 2012

CORES_COUNT = 4LIMIT = ARGV[0].to_i || 10000sqls = CORES_COUNT.times.map do |x| sprintf("SELECT * FROM cdb_posts ORDER BY pid LIMIT %d OFFSET %d;", LIMIT, (x * LIMIT))end

class String def to_my_val "'#{Mysql2::Client.escape self.force_encoding(‘Big5-UAO’).encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '??')}'" endend

procs = sqls.map do |sql| Proc.new do |out| Mysql2::Client.new(database: DBNAME, reconnect: true, encoding: 'latin1').query(sql).each(as: :array) do |row| out.print "INSERT INTO `cdb_posts` VALUES (#{row.map(&:to_my_val).join(',')});\n" end endendprocs.each{|p| p.call(OUT)}

12年12月8⽇日星期六

Page 13: Concurrency model for mysql data processing@rubyconf.tw 2012

Thanks for Ruby 1.9’s Awesome Encoding class which supports

Big5-UAO.

12年12月8⽇日星期六

Page 14: Concurrency model for mysql data processing@rubyconf.tw 2012

Reduces almost 80% of encoding problem.

12年12月8⽇日星期六

Page 15: Concurrency model for mysql data processing@rubyconf.tw 2012

But the file size is too big to wait for transcoding!

12年12月8⽇日星期六

Page 16: Concurrency model for mysql data processing@rubyconf.tw 2012

So I have to find the concurrency model to

make it faster.

12年12月8⽇日星期六

Page 17: Concurrency model for mysql data processing@rubyconf.tw 2012

Experiment Target

• Test the difference of performance between thread and fork model.

12年12月8⽇日星期六

Page 18: Concurrency model for mysql data processing@rubyconf.tw 2012

H&W Platform

• 4 Cores Core2Quad [email protected]

• 8GB RAM

• 1*SSD

• MacOS 10.8

• MRI 1.9.3p194

12年12月8⽇日星期六

Page 19: Concurrency model for mysql data processing@rubyconf.tw 2012

DBNAME = 'wwwfsc'CORES_COUNT = 4ForceEncoding = 'Big5-UAO'LIMIT = ARGV[0].to_i || 10000OUT = '/dev/null'

sqls = CORES_COUNT.times.map do |x| sprintf("SELECT * FROM cdb_posts ORDER BY pid LIMIT %d OFFSET %d;", LIMIT, (x * LIMIT))end

class String def to_my_val "'#{Mysql2::Client.escape self.force_encoding(ForceEncoding).encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '??')}'" endend

procs = sqls.map do |sql| Proc.new do |out| open(out,'w') do |io| Mysql2::Client.new(database: DBNAME, reconnect: true, encoding: 'latin1').query(sql).each(as: :array) do |row| io.print "INSERT INTO `cdb_posts` VALUES (#{row.map(&:to_my_val).join(',')});\n" end end endend

12年12月8⽇日星期六

Page 20: Concurrency model for mysql data processing@rubyconf.tw 2012

Benchmark.bm(15) do |x| x.report("Thread"){procs.map{|p| Thread.new{p.call(OUT)} }.each(&:join)} x.report("Fork"){procs.each{|p| fork{p.call(OUT)} }; Process.waitall} x.report("Normal"){procs.each{|p| p.call(OUT)}}end

12年12月8⽇日星期六

Page 21: Concurrency model for mysql data processing@rubyconf.tw 2012

Result of 100k*4 rows

12年12月8⽇日星期六

Page 22: Concurrency model for mysql data processing@rubyconf.tw 2012

Thread

12年12月8⽇日星期六

Page 23: Concurrency model for mysql data processing@rubyconf.tw 2012

Fork

12年12月8⽇日星期六

Page 24: Concurrency model for mysql data processing@rubyconf.tw 2012

Fork

12年12月8⽇日星期六

Page 25: Concurrency model for mysql data processing@rubyconf.tw 2012

Circumstance

• Thread

‣ CPU utilization rate between 105 and 125 percent.

• Fork

‣ The rate changes frequently between processes.

12年12月8⽇日星期六

Page 26: Concurrency model for mysql data processing@rubyconf.tw 2012

GVL still effects

12年12月8⽇日星期六

Page 27: Concurrency model for mysql data processing@rubyconf.tw 2012

Try again

12年12月8⽇日星期六

Page 28: Concurrency model for mysql data processing@rubyconf.tw 2012

Decompose the steps to find how to skip

GVL.

12年12月8⽇日星期六

Page 29: Concurrency model for mysql data processing@rubyconf.tw 2012

Experiment No.2

12年12月8⽇日星期六

Page 30: Concurrency model for mysql data processing@rubyconf.tw 2012

Minify the process to query DB only.

12年12月8⽇日星期六

Page 31: Concurrency model for mysql data processing@rubyconf.tw 2012

DBNAME = 'wwwfsc'CORES_COUNT = 4limit = ARGV[0].to_i || 10000

sqls = CORES_COUNT.times.map do |x| sprintf("SELECT * FROM cdb_posts ORDER BY pid LIMIT %d OFFSET %d;", limit, (x * limit))endprocs = CORES_COUNT.times.map do |x| Proc.new do client = Mysql2::Client.new(database: DBNAME, reconnect: true) result = client.query(sqls[x]) endend

Benchmark.bmbm(15) do |x| x.report("Thread"){procs.map{|p| Thread.new{p.call} }.each(&:join)} x.report("Fork"){procs.each{|p| fork{p.call} }; Process.waitall} x.report("Normal"){procs.each(&:call)}end

12年12月8⽇日星期六

Page 32: Concurrency model for mysql data processing@rubyconf.tw 2012

Result of 100k*4 rows

12年12月8⽇日星期六

Page 33: Concurrency model for mysql data processing@rubyconf.tw 2012

It seems the Mysql2 Gem can skip GVL.

12年12月8⽇日星期六

Page 34: Concurrency model for mysql data processing@rubyconf.tw 2012

Experiment NO.3

12年12月8⽇日星期六

Page 35: Concurrency model for mysql data processing@rubyconf.tw 2012

Limit the experiment to I/O operation only.

12年12月8⽇日星期六

Page 36: Concurrency model for mysql data processing@rubyconf.tw 2012

client = Mysql2::Client.new(database: DBNAME, reconnect: true, encoding: 'latin1')sql_raws = sqls.map do |sql| arr = [] client.query(sql).each(as: :array) do |row| arr << "(#{row.map(&:to_my_val).join(',')})" end arrend

procs = sql_raws.map do |arr| proc do io = open('/dev/null','w') io.write "INSERT INTO `cdb_posts` VALUES " io.write arr.join(',') io.write "\n" io.close endend

Benchmark.bm(15) do |x| x.report("Thread"){procs.map{|p| Thread.new{p.call} }.each(&:join)} x.report("Fork"){procs.each{|p| fork{p.call} }; Process.waitall} x.report("Normal"){procs.each{|p| p.call}}end

12年12月8⽇日星期六

Page 37: Concurrency model for mysql data processing@rubyconf.tw 2012

12年12月8⽇日星期六

Page 38: Concurrency model for mysql data processing@rubyconf.tw 2012

Result of 100k*4 rows

12年12月8⽇日星期六

Page 39: Concurrency model for mysql data processing@rubyconf.tw 2012

Change I/O to different files.

12年12月8⽇日星期六

Page 40: Concurrency model for mysql data processing@rubyconf.tw 2012

procs = sql_raws.map do |arr| proc do io = Tempfile.new(SecureRandom.uuid)#open('/dev/null','w') puts io.path io.write "INSERT INTO `cdb_posts` VALUES " io.write arr.join(',') io.write "\n" io.close endend

12年12月8⽇日星期六

Page 41: Concurrency model for mysql data processing@rubyconf.tw 2012

Result reversed

12年12月8⽇日星期六

Page 42: Concurrency model for mysql data processing@rubyconf.tw 2012

Implement the same change to the first

experiment.

12年12月8⽇日星期六

Page 43: Concurrency model for mysql data processing@rubyconf.tw 2012

procs = sqls.map do |sql| Proc.new do io = Tempfile.new(SecureRandom.uuid) Mysql2::Client.new(database: DBNAME, reconnect: true, encoding: 'latin1').query(sql).each(as: :array) do |row| io.write "INSERT INTO `cdb_posts` VALUES (#{row.map(&:to_my_val).join(',')});\n" end io.close endend

12年12月8⽇日星期六

Page 44: Concurrency model for mysql data processing@rubyconf.tw 2012

Dose not effect any

12年12月8⽇日星期六

Page 45: Concurrency model for mysql data processing@rubyconf.tw 2012

ConclusionThread fork normal

MySQL2-read Fast Fast x

Transcoding & iteration Slow Very fast x

Write to the same I/O Very slow Slow Fast

Write to the different I/O Fast Slow Fast

12年12月8⽇日星期六

Page 46: Concurrency model for mysql data processing@rubyconf.tw 2012

There is no effective and 「all-around」 concurrency model.

12年12月8⽇日星期六

Page 47: Concurrency model for mysql data processing@rubyconf.tw 2012

The small I/O can’t release GVL.

12年12月8⽇日星期六

Page 50: Concurrency model for mysql data processing@rubyconf.tw 2012

End

12年12月8⽇日星期六