Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!
-
Upload
matheus-espanhol -
Category
Data & Analytics
-
view
1.615 -
download
2
Transcript of Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!
![Page 1: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/1.jpg)
![Page 2: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/2.jpg)
Agenda
● Bancos de dados analíticos
● Escalabilidade vertical e horizontal
● Paralelismo interconsulta e intraconsulta
● Sharding
● postgres_fdw
● PL/Proxy
● pg_shard
● PostgresXL
● Outras ferramentas
![Page 3: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/3.jpg)
Bigdata
● Volume
● Velocidade
● Variedade
Como o Postgres pode “surfar essa onda”?
![Page 4: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/4.jpg)
Bancos de dados analíticos
● Orientado a assunto● Integração de dados● Dados históricos● Sumarização● Intervalo de atualização curto ou em tempo real● Relatórios complexos em tempo real
Pode fazer parte da solução Bigdata ou, dependendo da necessidade, centralizar toda a solução
![Page 5: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/5.jpg)
Bancos de dados analíticos
● OLAP Online Analytical Processing● MPP Massive Parallel Processing● Armazenamento Colunar● Compressão de dados em memória● Recursos para queries analíticas
![Page 6: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/6.jpg)
PostgreSQL: Base para produtos
Aster DataCitusDB
Greenplum
HadoopDB
HadaptNetezza
ParAccel
pipelineDB
Red Shift
Vertica
Yahoo! Everest
![Page 7: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/7.jpg)
PostgreSQL: Base para produtos
Aster DataCitusDB
Greenplum
HadoopDB
HadaptNetezza
ParAccel
pipelineDB
Red Shift
Vertica
Yahoo! Everest
![Page 8: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/8.jpg)
Escalabilidade
Performance aumenta proporcionalmente ao aumento dos recursos de hardware.
Vertical Horizontal
![Page 9: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/9.jpg)
Escalabilidade vertical no Postgres
● Alta concorrência● Arquitetura multiprocesso● MVCC● CPU: Até 64 cores● Memória: Shared buffers (Melhorias na 9.5)● Disco: Parâmetros para tuning de I/O
OLTP
![Page 10: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/10.jpg)
Mas...
IDLEIDLE
IDLE
IDLE
IDLEIDLE
IDLE
IDLE
SELECT
IDLE
IDLE
![Page 11: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/11.jpg)
Mas...
IDLEIDLE
IDLE
IDLE
IDLEIDLE
IDLE
IDLE
SELECT
IDLE
IDLE
![Page 12: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/12.jpg)
Paralelismo
Grandes problemas podem ser divididos em problemas menores para serem resolvidos de
forma concorrente (em paralelo).
Vamos por partes...
![Page 13: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/13.jpg)
Paralelismo na leitura sequencial
● Postgres 9.4: Infraestrutura básica– Memória compartilhada dinâmica
– Background workers dinâmicos
– Fila de mensagens em memória
● Postgres 9.5– Alguns patches com mudanças na arquitetura
– Propagação de erros
![Page 14: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/14.jpg)
Paralelismo na leitura sequencial
● Postgres 9.6?– Partial Seq Scan
– Funnel
![Page 15: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/15.jpg)
Escalabilidade horizontal
● Escalabilidade apenas de leitura– Streaming Replication + Hot Standby
![Page 16: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/16.jpg)
Escalabilidade horizontal
● Escalabilidade de escrita– FDW
– PL/Proxy
– pg_shard
– PostgresXL
![Page 17: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/17.jpg)
Escalabilidade horizontal
● Shared memory– CPUs compartilham memória e disco
– Controle de transações simplificado
– Escalabilidade limitada
● Shared disk– Apenas o disco é compartilhado
– Escalabilidade um pouco menos limitada
● Shared nothing– Sharding
– Cada nó com seu próprio conjunto de dados
– Controle de transações complexo
– Maior escalabilidade
![Page 18: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/18.jpg)
Sharding
● Particionamento horizontal de uma ou mais tabelas em instâncias ou servidores distintos
![Page 19: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/19.jpg)
Sharding
● Via aplicação– Distribuição de dados por funcionalidade
– Modelo de dados específico
– Mais simples
do ponto de vista do DBA
![Page 20: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/20.jpg)
Sharding
● Via banco de dados
● Desafios– Escolher a melhor chave de particionamento
– ACID
– Queries distribuídas
– JOIN entre shards
– Foreign keys
![Page 21: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/21.jpg)
Devemos considerar que...
● A solução de sharding precisa balancear os dados entre os servidores e garantir a integridade dos dados
● A falha de um shard impacta todos os outros● Ambiente complexo para administrar● A replicação deve ocorrer em cada shard● Backup deve ser consistente entre os shards
![Page 22: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/22.jpg)
Testes executadosCREATE TABLE ratings (
userId int,
movieId int,
rating numeric,
timestamp bigint);
CREATE TABLE movies(
movieId int,
title text,
others jsonb);
http://grouplens.org/datasets/movielens
![Page 23: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/23.jpg)
postgres_fdw (>=9.3)
● Foreign Data Wrapper para o próprio Postgres● Acesso aos shards através de Foreign Tables● Implementado através de views● Consegue filtrar os dados nos shards● Não acessa vários shards em paralelo● Todas as queries relacionadas ao sharding
devem utilizar a view– Views com UNION ALL não são atualizáveis
– Necessidade de RULE ou TRIGGER para alterações
![Page 24: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/24.jpg)
CREATE EXTENSION postgres_fdw;
CREATE SERVER srv_pgdaycps1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pgdaycps_node1',dbname 'pgdaycps');CREATE SERVER srv_pgdaycps2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pgdaycps_node2',dbname 'pgdaycps');CREATE SERVER srv_pgdaycps3 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'pgdaycps_node3',dbname 'pgdaycps');
CREATE USER MAPPING FOR postgres SERVER srv_pgdaycps1 OPTIONS (user 'postgres', password 'postgres');CREATE USER MAPPING FOR postgres SERVER srv_pgdaycps2 OPTIONS (user 'postgres', password 'postgres');CREATE USER MAPPING FOR postgres SERVER srv_pgdaycps3 OPTIONS (user 'postgres', password 'postgres');
postgres_fdw (>=9.3)
![Page 25: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/25.jpg)
CREATE FOREIGN TABLE fdw_movies1 (movieId int,title text,others jsonb) SERVER srv_pgdaycps1 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');CREATE FOREIGN TABLE fdw_movies2 (movieId int,title text,others jsonb) SERVER srv_pgdaycps2 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');CREATE FOREIGN TABLE fdw_movies3 (movieId int,title text,others jsonb) SERVER srv_pgdaycps3 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');
CREATE OR REPLACE VIEW fdw_movies ASSELECT 'node1' AS node,* FROM fdw_movies1UNION ALLSELECT 'node2' AS node,* FROM fdw_movies2UNION ALLSELECT 'node3' AS node,* FROM fdw_movies3;
postgres_fdw (>=9.3)
![Page 26: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/26.jpg)
postgres_fdw (>=9.3)
EXPLAIN ANALYZE SELECT title,others>>'link_themoviedb' AS moviedb_link FROM movies WHERE movieid = 131262; Seq Scan on movies (cost=0.00..1292.30 rows=1 width=242) (actual time=3.056..3.057 rows=1 loops=1) Filter: (movieid = 131262) Rows Removed by Filter: 27306 Planning time: 0.044 ms Execution time: 3.075 ms
![Page 27: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/27.jpg)
EXPLAIN ANALYZE SELECT title,others>>'link_themoviedb' AS moviedb_link FROM fdw_movies WHERE node = 'node3' AND movieid = 131262; Result (cost=100.00..121.74 rows=5 width=64) (actual time=1.110..1.111 rows=1 loops=1) > Append (cost=100.00..121.72 rows=5 width=64) (actual time=1.106..1.107 rows=1 loops=1) > Foreign Scan on fdw_movies3 (cost=100.00..121.72 rows=5 width=64) (actual time=1.105..1.105 rows=1 loops=1) Planning time: 0.120 ms Execution time: 1.467 ms
postgres_fdw (>=9.3)
![Page 28: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/28.jpg)
postgres_fdw (>=9.5)
● Herança para FOREIGN TABLEs
![Page 29: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/29.jpg)
CREATE FOREIGN TABLE fdw_movies1() INHERITS (movies)SERVER srv_pgdaycps1 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');CREATE FOREIGN TABLE fdw_movies2() INHERITS (movies)SERVER srv_pgdaycps2 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');CREATE FOREIGN TABLE fdw_movies3() INHERITS (movies)SERVER srv_pgdaycps3 OPTIONS (schema_name 'movielens', table_name 'fdw_movies');
ALTER TABLE fdw_movies1 ADD CHECK (movieid BETWEEN 1 AND 50000);ALTER TABLE fdw_movies2 ADD CHECK (movieid BETWEEN 50001 AND 100000);ALTER TABLE fdw_movies3 ADD CHECK (movieid BETWEEN 100001 AND 150000);
postgres_fdw (>=9.5)
![Page 30: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/30.jpg)
EXPLAIN ANALYZE SELECT title,others>>'link_themoviedb' AS moviedb_link FROM movies WHERE movieid = 131262; Result (cost=0.00..121.74 rows=6 width=64) (actual time=1.198..1.199 rows=1 loops=1) > Append (cost=0.00..121.72 rows=6 width=64) (actual time=1.193..1.193 rows=1 loops=1) > Seq Scan on movies (cost=0.00..0.00 rows=1 width=64) (actual time=0.001..0.001 rows=0 loops=1) Filter: (movieid = 131262) > Foreign Scan on fdw_movies3 (cost=100.00..121.72 rows=5 width=64) (actual time=1.192..1.192 rows=1 loops=1) Planning time: 0.152 ms Execution time: 1.559 ms
postgres_fdw (>=9.5)
![Page 31: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/31.jpg)
PL/Proxy
● Linguagem procedural para execução de funções remotas
● Pgbouncer pode ser usado para diminuir o impacto da abertura de conexões com os nós
● Distribui as queries em paralelo● Todas as queries relacionadas ao sharding
devem utilizar funções
![Page 32: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/32.jpg)
PL/Proxy
![Page 33: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/33.jpg)
PL/Proxy
CREATE EXTENSION plproxy;
CREATE SERVER srv_pgdaycps FOREIGN DATA WRAPPER plproxy OPTIONS (connection_lifetime '1800', p0 'dbname=pgdaycps host=pgdaycps_node0 port=5432',p1 'dbname=pgdaycps host=pgdaycps_node1 port=5432',p2 'dbname=pgdaycps host=pgdaycps_node2 port=5432',p3 'dbname=pgdaycps host=pgdaycps_node3 port=5432');
CREATE USER MAPPING FOR proxy SERVER srv_pgdaycps OPTIONS (user 'node_user', password 'node_pass');
No banco de dados proxy:
![Page 34: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/34.jpg)
PL/Proxy
CREATE OR REPLACE FUNCTION insert_movies(movieId int,title text,others jsonb)RETURNS void AS $$ CLUSTER 'srv_pgdaycps'; RUN ON hashtext(movieId::text);$$ LANGUAGE plproxy;
CREATE OR REPLACE FUNCTION select_movies(fmovieId int)RETURNS TABLE(tmovieId int,title text,others jsonb) AS $$ CLUSTER 'srv_pgdaycps'; RUN ON ALL;$$ LANGUAGE plproxy;
No banco de dados proxy:
![Page 35: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/35.jpg)
PL/Proxy
CREATE FUNCTION insert_movies(movieId int,title text,others jsonb)RETURNS void AS $$BEGIN
INSERT INTO plproxy_movies VALUES (movieId,title,others);
END;$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION select_movies(fmovieId int) RETURNS TABLE(tmovieId int,title text,others jsonb) AS $$BEGIN
RETURN QUERY SELECT * FROM plproxy_moviesWHERE movieid = fmovieId;
END;$$ LANGUAGE plpgsql;
Em cada shard:
![Page 36: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/36.jpg)
PL/Proxy
EXPLAIN ANALYZE SELECT title,others>>'link_themoviedb' AS moviedb_link FROM select_movies(131262); Function Scan on select_movies (cost=0.25..12.75 rows=1000 width=64) (actual time=1.634..1.635 rows=1 loops=1) Planning time: 0.030 ms Execution time: 1.649 ms
![Page 37: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/37.jpg)
pg_shard
● Extensão do Postgres que gerencia sharding ● Facilidade de administração
– Automatiza vários processos nos shards
● Recursos para recuperação de falhas● Rebalanceamento de shards● Transparente para a aplicação● Alterações no planejador e executor do
Postgres (Hook)
![Page 38: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/38.jpg)
pg_shard
![Page 39: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/39.jpg)
pg_shard
● Particionamento lógico– Várias pequenas partições em cada shard
– Réplicas de cada partição em shards distintos
● Vantagens:– Rebalanceamento
– Recuperação de falhas
– Performance
![Page 40: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/40.jpg)
CREATE EXTENSION pg_shard;ALTER SYSTEM SET shared_preload_libraries TO 'pg_shard';cat $PGDATA/pg_worker_list.confpgdaycps_node1 5432pgdaycps_node2 5433pgdaycps_node3 5434
pg_ctl restartSELECT master_create_distributed_table('movies', 'movieid');SELECT master_create_worker_shards('movies', 21, 2);
pg_shard
No banco de dados de metadados:
Nos shards é preciso apenas criar o banco de dados e o schema da tabela que será particionada
![Page 41: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/41.jpg)
split n l/4 movies.csv chunks/find chunks/ type f | xargs n 1 P 4 sh c 'echo $0 `copy_to_distributed_table C $0 movies`'
SELECT title,others>>'genres' AS genres FROM movies_normal WHERE movieid = 1;Time: 3.344 ms
SELECT title,others>>'genres' AS genres FROM movies_sharding WHERE movieid = 1;Time: 1.392 ms
pg_shard
Alternativa para o COPY:
Selecionando dados:
![Page 42: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/42.jpg)
pg_shard
● Limitações com tabelas distribuídas:– Particionamento por intervalo (Range)
– Queries com subconsultas
– JOINs
– Restrições UNIQUE e FOREIGN KEY
– Transações em vários shards
– Alterações na tabela
– Uso do COPY
– JOIN e agregações nos shards
– Não suporta EXPLAIN
![Page 43: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/43.jpg)
pg_shard
● Versão 1.2 liberada dia 30 de Julho– Chave de particionamento composta
● Roadmap– COPY para tabelas distribuídas
– Algumas agregações nos shards
![Page 44: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/44.jpg)
PostgresXL
● Fork do PostgreSQL (Atualmente versão 9.2)● Derivado do PostgresXC
– MPP (Massively Parallel Processing)
– Ideal para ambientes OLTP e OLAP
– Melhorias de performance e estatísticas
● Sharding e replicação● Suporte completo a ACID
– MVCC distribuído (GTM)
● JOINs e agregações nos shards
![Page 45: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/45.jpg)
PostgresXL
![Page 46: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/46.jpg)
tar zxvf postgresxlv9.2src.tar.gzcd postgresxl./configuremakemake install
PostgresXL
Instalação em todos os nós do cluster:
A conexão SSH entre os nós do cluster deve ocorrer sem senha para utilização do pgxc_ctl
Em um dos nós:cd postgresxlcd contrib/pgxc_ctlmakemake install
![Page 47: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/47.jpg)
pgxc_ctl prepare
PostgresXL
Editar o arquivo $HOME/pgxc_ctl/pgxc_ctl.conf
pgxc_ctl init all
CREATE TABLE movielens.movies(movieId int,title text,others json) DISTRIBUTE BY HASH (movieId);
COPY movielens.movies FROM '/home/postgres/movies.csv' CSV;
![Page 48: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/48.jpg)
SELECT count(*) FROM movies ; 27303
EXECUTE DIRECT ON (datanode1) 'SELECT count(*) FROM movies'; 6886
EXPLAIN ANALYZE SELECT title,others AS moviedb_link FROM movies WHERE movieid = 131262; Remote Subquery Scan on all (datanode1) (cost=0.00..20.38 rows=4 width=64) (actual time=0.870..0.871 rows=1 loops=1) Total runtime: 1.226 ms
PostgresXL
![Page 49: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/49.jpg)
SELECT count(*) FROM ratings_replication WHERE movieid IN (1,131260,131262); 54147Time: 824.229 ms
SELECT count(*) FROM ratings_sharding WHERE movieid IN (1,131260,131262); 54147Time: 43.772 ms
Aggregate (cost=145191.78..145191.79 rows=1 width=0) > Remote Subquery Scan on all (datanode1,datanode2,datanode3,datanode4) (cost=5895.66..144401.92 rows=315946 width=0)
PostgresXL
![Page 50: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/50.jpg)
PostgresXL
● Limitações– Rebalanceamento dos shards
– Complexidade para acrescentar novos shards
– Configuração automática de réplicas dos componentes
![Page 51: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/51.jpg)
PostgresXL
● Roadmap– Merge com PostgreSQL 9.5
– Alta disponibilidade nativa
– GTM opcional
– Melhorias para plataforma analítica
![Page 52: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/52.jpg)
Outras opções
● cstore_fdw● IMCS: InMemory Columnar Store● BigSQL● hadoop_fdw● hdfs_fdw● hive_fdw● PGStrom
![Page 53: Escalabilidade, Sharding, Paralelismo e Bigdata com PostgreSQL? Yes, we can!](https://reader030.fdocument.pub/reader030/viewer/2022021507/58f331321a28abe0628b45ad/html5/thumbnails/53.jpg)
Mas primeiro...faça a lição de casa
● Refatore e otimize as principais rotinas● Revise a indexação● Tuning● Pool de conexões● Views materializadas (consolidação)● Particionamento de tabelas● Réplicas de leitura