Shard-Query, an MPP database for the cloud using the LAMP stack
-
Upload
justin-swanhart -
Category
Data & Analytics
-
view
814 -
download
113
Embed Size (px)
description
Transcript of Shard-Query, an MPP database for the cloud using the LAMP stack

Shard-QueryAN MPP DATABASE FOR THE CLOUD USING THE
LAMP STACK

Introduction
Presenter
• Justin Swanhart
• Principal Support Engineer at Percona
• Previously a trainer and consultant at Percona too
Developer
• Swanhart-tools
• Shard-Query – MPP sharding middleware for MySQL
• Flexviews – Materialized views (fast refresh) for MySQL
• bcmath UDF – arbitrary precision math for MySQL

Intended Audience
• MySQL users with data too large to query efficiently using a single machine
• Big Data
• Analytics / OLAP
• User generated content analysis
• People interested in distributed database processing

Terms

MPP – Massively Parallel Processing
• An MPP system is a system that can process a SQL statement in parallel on a single machine or even many machines
• A collection of machines is often called a Grid
• MPP is also sometimes called Grid Computing

MPP (cont)
• Not many open source databases (none?) support MPP
• Community editions of closed source offerings are limited
• Some closed source databases include Vertica, Greenplum, Redshift

The Cloud
• Managed collection of virtual servers
• Easy to add servers on demand
• Ideal for a federated, distributed database grid
• Easy to “scale up” by moving to a VM with more cores
• Easy to “scale out” by adding machines
• Amazon is one of the most popular cloud environments

LAMP stack
• Linux
• Amazon Linux
• RHEL
• Ubuntu LTS, etc.
• Apache Web Server
• Most popular web server on the planet
• MySQL
• The world’s most popular open source database
• PHP
• High level language makes development easier

Database Middleware
• A piece of software that sits between an end-user application and the database
• Operates on the queries submitted by the application, then returns the results to the application
• Usually a proxy of some sort
• MySQL proxy is the open source user configurable proxy for MySQL
• Supports Lua scripts which intercept queries
• Shard-Query can use MySQL Proxy out of the box

Message Queue / Job Server
• Accepts jobs or messages and places them in a queue
• A worker reads jobs/messages from the queue and acts on them
• Offers support for asynchronous jobs
• Gearman
• My job server of choice for PHP
• Has two different PHP interfaces (pear and pecl)
• SQ comes bundled with a modified version* of the pear interface
• Excellent integration with MySQL as well (UDF)
* Removes warnings triggered by modern PHP strict mode

Sharding
• It is a short for Shared Nothing
• Means splitting up your data onto more than one machine
• Tables that are split up are called sharded tables
• Lookup tables are not sharded. In other words, they must be duplicated on all nodes
• Shard-Query supports directory based or hash based sharding

Shard mapper
• Shard-Query supports DIRECTORY and HASH mapping out of the box
• DIRECTORY based sharding allows you to add or remove shards from the system, but lookups may go over the network, reducing performance* compared to HASH mapping
• HASH based sharding uses a hash algorithm to balance rows over the sharded database. However, since a HASH algorithm is used, the number of database shards can not change after initial data loading.
* But only for queries like “select count(*) from table where customer_id = 50”

What is “big data”
Most machine generated data
• Line order information for a large organization like Wal-Mart™
• Any data so large that you can’t effectively operate on it on one machine
• For example, an important query that needs to run daily executes in greater than 24 hours. It is impossible to meet the daily goal unless you can find a way to make the query execute faster.
• These kind of problems can happen on relatively small amounts of data (tens of gigabytes)

Analytics(OLAP) versus OLTP
• OLTP is focused on short lived small transactions that read or write small amounts of data
• OLAP is focused on bulk loading and reading large amounts of data in a single query.
• Aggregation queries are OLAP queries
• Shard-Query is designed for analytics (OLAP) not OLTP
• must parse all commands sent to it (and make multiple round trips)
• Minium query time of around 20ms

PROBLEM: Single Threaded QueriesTHE BIGGEST BOTTLENECK IN ANALYTICAL QUERIES IS THE SPEED OF A SINGLE CORE

Single thread queries in the database
• MySQL, PostgreSQL, Firebird and all other major open source databases have single threaded queries
• This means that a single query can only ever utilize the resources of a single core
• As the data size grows, analytical queries get slower and slower
• In memory, as the data grows the speed decreases because the data is accessed in a single query
• As the number of rows to be examined increases, performance decreases

Why single threaded
• MySQL is optimized for getting small amounts of data quickly(OLTP)
• It was created at a time when having more than one CPU was not common
• Adding parallelism now is a very complex task, particularly since MySQL supports multiple storage engines
• So adding parallel query is not a high priority (not even on the roadmap)
• Designed to run LOTS of small queries simultaneously, not one big query

Single Threading – bad for IO
• If the data set is significantly larger than memory, single threaded queries often cause the buffer pool to "churn“
• For example, small lookup tables can easily be pushed out of the buffer pool, resulting in frequent IO to look up values
• While SSD may helps somewhat, one database thread can not read from an SSD at maximum device capacity
• While the disk may be capable of 1000+ MB/sec, a single thread is generally limited to <100MB/sec (usually 30-40)
• This is because a single thread shares doing IO AND running the query on one CPU (MySQL does not use read threads for queries)

The OLAP Example
• A large company maintains a star schema of their sales history for analytics purposes
• This company likes to present a sum total of orders for all time on the dashboard
• In the beginning the query is very fast
• It gets slower, though, as months of data are added and as the business grows, data increases too
• Eventually the query takes more than 24 hours to run, which means it can no longer be updated daily
• “Drill down” gets slower as data increases

What can be done?
• Caching?
• Materialized views?
• Partitioning?
• Sharding?

Making OLAP more like OLTP!
• Shard-Query breaks on big query up into smaller queries that can access the database in parallel
• Partitioning and sharding are used to keep data size for any single query to a minimum
• If your table has 16 partitions, you can get up to 16 way parallelism
• If you also have 2 nodes, you get 32 way parallelism, and so on
• You can use multiple database schema on a single server instead (a form of sharding) if you don’t partition your data

Shard-QueryADDING PARALLELISM TO QUERIES

Sharding Reviewed
• A sharded database contains multiple nodes or databases called shards
• One physical machine might host many shards
• Each shard has identical schema

Sharding Reviewed (cont)
• The multiple shards function together as one RDBMS system.
• You can think of the shards as a big UNION ALL of the data, with only a portion of the data on any one machine
• A mechanism must control which server on which to place particular pieces of data.
• In Shard-Query a particular column controls data placement – this is called the shard key

Sharding – Data distribution
• There are usually one or two large tables that are sharded
• These are usually called FACT tables
• An example might be blogs, blog_posts and blog_comments. All three share a “shard key” of blog_id
• Most common case is one big table with smaller lookup tables

Sharding Reviewed (cont)
• The shard key is very important!
• Since a specific column acts as the “shard key”, all sharded tables must contain the shard key.
• For example: blog_id might be the shard key.
• The rows for a specific blog_id are then located on the same shard in any table that has the blog_id column

Optimization - Shard Elimination
• When Shard-Query sees an expression on the shard key it looks up* the shard that contains the appropriate data and only sends queries to the necessary shards.
• Equality lookup is most efficient, but IN, BETWEEN and other operators are allowed as well
• Lookups may not use subqueries (ie, blog_id IN (1,2,3) is okay, not blog_id in (select …))
• This is called “shard elimination”
• Shard elimination is analogous to partition elimination.
• where blog_id = 10, for example

Can Shard-Query help on 1 machine?
• Yes! - Use MySQL partitioning on a single machine
• Shard-Query can access the partitions of a table in parallel!
• This means that if you have many partitions, then Shard-Query can utilize many cores to answer the query
Use partitions for parallelism

How does that work?
• Shard-Query executes an EXPLAIN PLAN on the query
• This EXPLAIN PLAN shows the partitions that MySQL will access when running the query
• Shard-Query uses the 5.6 PARTITION hint to generate one query per partition
• These queries can execute in parallel

Sharding can help too
• How?
• Shard-Query adds parallelism to queries by spreading them over nodes in parallel
• Spread the data over four nodes and queries are 4x faster
MySQL database shards
Shard-Query

Sharding + Partitioning is best
• Why?
• Partition the tables to add parallelism to each node
• Use sharding to have multiple nodes working together
• 4 nodes with 3 partitions each = 12 way parallelism
Shard-Query
MySQL database shards
Partitions

Shard-QueryARCHITECTURE

Configuration Repository
• Shard-Query stores all configuration information in a MySQL database called the configuration repository
• This should be a highly available replication pair (or XtraDBcluster) for HA
• Web interface can change the settings
• Manual settings changes can be done via SQL
• schemata_config table in Shard-Query repository
• Makes using Shard-Query easier, especially when using more than one node

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query Architecture
Interfaces
Communication
Workers
Storage
ConfigRepository
Configuration Management

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query ArchitectureGearman job server
• Provides the parallel mechanism for Shard-Query
• Multiple Gearman are supported for HA
• Enables Shard-Query to use a map/reduce like architecture
• Sends jobs to workers when they arrive at the queue
• If all workers are busy the job waits

Gearman at a glance
Shard-Query OO
Store-resultsetLoader workerSQ run SQL worker

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query ArchitectureThree kinds of workers
• loader_worker – Listens for loader jobs and executes them. Used by parallel loader.
• shard_query_worker – Listens for SQL jobs, runs the job via Shard-Query and returns the results as JSON. Used by web and proxy interfaces.
• store_resultset_worker – Main worker used by Shard-Query. It runs SQL and stores the result in a table.

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query ArchitecturePHP Object Oriented Interface
• Very simple to use
• Constructor parameters not even usually needed
• Just one function to run a SQL query and get results back
• Complete example comes with Shard-Query as: bin/run_query

PHP OO Example (from bin/run_query):
$shard_query = new ShardQuery();$stime = microtime(true);$stmt = $shard_query->query($sql);$endtime = microtime(true);
if(!empty($shard_query->errors)) {if(!empty($shard_query->errors)) {echo "ERRORS RETURNED BY OPERATION:\n";print_r($shard_query->errors);
}}
if(is_resource($stmt) || is_object($stmt)) {$count=0;while($row = $shard_query->DAL->my_fetch_assoc($stmt)) {
print_r($row);++$count;
}echo "$count rows returned\n";$shard_query->DAL->my_free_result($stmt);
} else {if(!empty($shard_query->info)) print_r($shard_query->info);echo "no query results\n";
}echo "Exec time: " . ($endtime - $stime) . "\n";
Simple data access layercomes with Shard-Query
Errors are returned as a memberof the object
Run the query

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query ArchitectureApache web interface
• GUI
• Easy to set up
• Run queries and get results
• Serves as an example of using Shard-Query in a web app with asynchronous queries
• Submits queries via Gearman
• Simple HTTP authentication

PHP OO Apache Web
Interface
MySQL Proxy
Gearman Message Queue
Worker Worker Worker Worker
MySQL database shards
Shard-Query ArchitectureMySQL Proxy Interface
• LUA script for MySQL Proxy
• Supports most SHOW commands
• Intercepts queries, and sends them to Shard-Query using the MySQL Gearman UDF
• Serves as another example of using Gearman to execute queries.
• Behaves slightly differently than MySQL for some commands

Query submitted
SQL is parsed
Query rewritefor parallelismyields multiple
queries
Gearman Jobs (map/combine)
Final Aggregation (reduce)
Return result
Shard-Query Data FlowMap/reduce like workflow

Query submitted
SQL is parsed
Query rewritefor parallelismyields multiple
queries
Gearman Jobs (map/combine)
Final Aggregation (reduce)
Return result
Shard-Query Data Flow

SQL Parser
• Find it at http://github.com/greenlion/php-sql-parser
• Supports
• SELECT/INSERT/UPDATE/DELETE
• REPLACE
• RENAME
• SHOW/SET
• DROP/CREATE INDEX/CREATE TABLE
• EXPLAIN/DESCRIBE
Used by SugarCRM too, aswell as other open source projects.

Query submitted
SQL is parsed
Query rewritefor parallelismyields multiple
queries
Gearman Jobs (map/combine)
Final Aggregation (reduce)
Return result
Shard-Query Data Flow

Query Rewrite for parallelism
• Shard-Query has to manipulate the SQL statement so that it can be executed over more than on partition or machine
• COUNT() turns into SUM of COUNTs from each query
• AVG turns into SUM and COUNT
• SEMI-JOIN is turned into a materialized join
• STDDEV/VARIANCE are rewritten as well use the sum of squares method
• Push down LIMIT when possible

Query Rewrite for parallelism (cont)
• Because lookup tables are duplicated on all shards, the query executes in a shared-nothing way
• All joins, filtering and aggregation are pushed down
• Mean very little data must flow between nodes in most cases
• High performance
• Meets or beats Amazon Redshift in testing at 200GB of data

Query submitted
SQL is parsed
Query rewritefor parallelismyields multiple
queries
Gearman Jobs (map/combine)
Final Aggregation (reduce)
Return result
Shard-Query Data Flow

Map/Combine
• The store_resultset gearman worker runs SQL and stores the result in a table
• To keep the number of rows in the table (and the time it takes to aggregate results in the end) small, an INSERT … ON DUPLICATE KEY UPDATE (ODKU) statement is used when inserting the rows
• There is a UNIQUE KEY over the GROUP BY attributes to facilitate the upsert

Query submitted
SQL is parsed
Query rewritefor parallelismyields multiple
queries
Gearman Jobs (map/combine)
Final Aggregation (reduce)
Return result
Shard-Query Data Flow

Final aggregation
• Shard-Query has to return a proper result, combining the results in the result table together to return the correct answer
• Again, for example COUNT must be rewritten as SUM to combine all the counts (from each shard) in the result table
• Aggregated result is returned to the client

Shard-Query Flow as SQL[justin@localhost bin]$ ./run_query --verboseselect count(*) from lineorder;Shard-Query optimizer messages:
SQL TO SEND TO SHARDS:Array(
[0] => SELECT COUNT(*) AS expr_2913896658FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1
[1] => SELECT COUNT(*) AS expr_2913896658FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1
[2] => SELECT COUNT(*) AS expr_2913896658FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1
[3] => SELECT COUNT(*) AS expr_2913896658FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 )SQL TO SEND TO COORDINATOR NODE:SELECT SUM(expr_2913896658) AS ` count `FROM `aggregation_tmp_58392079`
Array(
[count ] => 0)1 rows returnedExec time: 0.03083610534668
Initial query
Query rewrite / map
Final aggregation / reduce
Final result

Map/Combine exampleselect LO_OrderDateKey, count(*) from lineorder group by LO_OrderDateKey;Shard-Query optimizer messages:* The following projections may be selected for a UNIQUE CHECK on the storage node operation:expr$0
* storage node result set merge optimization enabled:ON DUPLICATE KEY UPDATE
expr_2445085448=expr_2445085448 + VALUES(expr_2445085448)
SQL TO SEND TO SHARDS:Array(
[0] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[1] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[2] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 GROUP BY expr$0
[3] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 GROUP BY expr$0)SQL TO SEND TO COORDINATOR NODE:SELECT expr$0 AS `LO_OrderDateKey`,SUM(expr_2445085448) AS ` count `FROM `aggregation_tmp_12033903` GROUP BY expr$0
combine
reduce

Use cases

Machine generated data
• Sensor readings
• Metrics
• Logs
• Any large table with short lookup tables
Star schema are ideal

Call detail records
• Shard-Query is used in the billing system of a large cellular provider
• CDRs generate a lot of data
• Shard-Query includes a fast PERCENTILE function

Green energy meter processing
• High volume of data means sharding is necessary
• With Shard-Query, reporting is possible over all the shards, making queries possible that would not work with Fabric or other sharding solutions
• Used in India for reporting on a green power grid

Log analysis
• Performance logs from a web application for example
• Aggregate many different statistics and shard if log volumes are high enough
• Search text logs with regular expressions

Performance

Star Schema Benchmark – SF 20
• 119 million rows of data (12GB)
• Infobright Community Database
• Only 1st query from each “flight” selected
• Unsharded compared to four shards (box has 4 cpu - Amazon m1.xlarge)

COLD
• MySQL – 35.39s
• Shard-Query – 11.62s
HOT
• MySQL – 10.99s
• Shard-Query – 2.95s
Query 1
select sum(lo_extendedprice*lo_discount) as revenuefrom lineorder join dim_date on lo_orderdatekey = d_datekeywhere d_year = 1993and lo_discount between 1 and 3and lo_quantity < 25;

COLD
• MySQL – 34.24s
• Shard-Query – 12.74s
HOT
• MySQL – 12.74s
• Shard-Query – 3.26s
Query 2
select sum(lo_revenue), d_year, p_brandfrom lineorderjoin dim_date on lo_orderdatekey = d_datekeyjoin part on lo_partkey = p_partkeyjoin supplier on lo_suppkey = s_suppkeywhere p_category = 'MFGR#12'and s_region = 'AMERICA'group by d_year, p_brandorder by d_year, p_brand;

COLD
• MySQL – 27.29s
• Shard-Query – 7.97s
HOT
• MySQL – 18.89
• Shard-Query – 5.06s
Query 3
select c_nation, s_nation, d_year, sum(lo_revenue) as revenuefrom customer join lineorderon lo_custkey = c_customerkeyjoin supplier on lo_suppkey = s_suppkeyjoin dim_date on lo_orderdatekey = d_datekeywhere c_region = 'ASIA'and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997group by c_nation, s_nation, d_yearorder by d_year asc, revenue desc;

COLD
• MySQL – 23.02s
• Shard-Query – 8.48s
HOT
• MySQL – 14.77
• Shard-Query – 4.29s
Query 4
select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profitfrom lineorder join dim_date on lo_orderdatekey = d_datekeyjoin customer on lo_custkey = c_customerkeyjoin supplier on lo_suppkey = s_suppkeyjoin part on lo_partkey = p_partkeywhere c_region = 'AMERICA'and s_region = 'AMERICA'and (p_mfgr = 'MFGR#1'or p_mfgr = 'MFGR#2')group by d_year, c_nationorder by d_year, c_nation;