Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
-
Upload
ontico -
Category
Engineering
-
view
1.940 -
download
4
Transcript of Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
![Page 1: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/1.jpg)
Spilo, highly-available PostgreSQL cluster
Oleksii Kliukin Zalando SE
![Page 2: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/2.jpg)
Zalando• 15 EU countries • 3 fulfilment
centers • 15+ million
active customers • 2.2 billion €
revenue 2014
![Page 3: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/3.jpg)
150 000+ products
![Page 4: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/4.jpg)
We are growing!
![Page 5: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/5.jpg)
Zalando platform
![Page 6: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/6.jpg)
Our databases• >150 production Postgresql
databases • >13.5 TB data • >5 TB biggest DB • 400-1000+ write tps • >2 DB failures/month
![Page 7: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/7.jpg)
Zalando never sleeps
![Page 8: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/8.jpg)
Infrastructure bottleneck
ACID Teamcreate alter deploy migrate failover upgrade
80+ teams
![Page 9: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/9.jpg)
Radical Agility
![Page 10: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/10.jpg)
Purpose
![Page 11: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/11.jpg)
Autonomy
![Page 12: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/12.jpg)
Mastery
![Page 13: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/13.jpg)
Cloud• 2013: ZCloud
• 2014: project Pequod
• 2015: Let’s just use AWS…
![Page 14: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/14.jpg)
Amazon 3-letter words
• AWS - amazon web services • EC2 - elastic compute cloud • ELB - elastic load balancer • RDS - relational DB service
![Page 15: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/15.jpg)
AWS• One account per team
• Microservices
• REST/OAuth2
• Deployment with Docker
![Page 16: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/16.jpg)
Autonomous teams on AWS
REST
INTERNET
![Page 17: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/17.jpg)
Autonomous teams• Team decides which product to
build • … and which technologies to use
• REST/OAuth2 mandatory
• Team is responsible for its infrastructure
![Page 18: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/18.jpg)
Databases?• Developers should take care
of infrastructure
• ..including production databases
• On AWS!
![Page 19: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/19.jpg)
Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958
![Page 20: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/20.jpg)
ACID team provides
PostgreSQL trainings
![Page 21: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/21.jpg)
What about failover?
![Page 22: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/22.jpg)
Autofailover tasks
• Detect the master failure
• Elect a new master
• Redirect clients
![Page 23: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/23.jpg)
Autofailover issues
• Discarded writes
• Split-brain
• False positives
![Page 24: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/24.jpg)
RDS?• Support for PostgreSQL
• Automatic failover
• Most extensions
• Automatic backups
![Page 25: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/25.jpg)
RDS?• Vendor lock
• No superuser
• No untrusted languages
• No logical decoding plugins
• Rather expensive
![Page 26: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/26.jpg)
EC2 + Linux HA
• Complex setup
• Lots of manual steps(i.e. new replica creation)
![Page 27: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/27.jpg)
Spilo (!"#$%)
![Page 28: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/28.jpg)
Spilo does
• Rapid deployment of PostgreSQL on AWS EC2 instances
• Streaming replication with auto-failover
![Page 29: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/29.jpg)
Spilo on AWS
Spilo MASTER
Spilo REPLICA
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
![Page 30: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/30.jpg)
Failover
Spilo REPLICA
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
![Page 31: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/31.jpg)
Failover
Spilo MASTER
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
NEW SPILO STARTS…
![Page 32: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/32.jpg)
Failover
Spilo MASTER
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
Spilo REPLICA
![Page 33: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/33.jpg)
What is Spilo?
cPatroni
MASTER
cPatroni
REPLICA
cPatroni
REPLICA
Auto-scaling group Auto-scaling group
![Page 34: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/34.jpg)
Patroni ("&'(%)#)• Handles new replicas and
failover
• Based on ideas and code of the Compose Governor
• Open-source
![Page 35: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/35.jpg)
Compose Governor idea
Core to our PostgreSQL HA system is the Governor application which uses etcd as its repository of truth to discover which database instance is leader.
![Page 36: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/36.jpg)
Distributed configuration systems
• Fault tolerant
• Reliably store small amounts of strongly-consistent data between distributed nodes
• Good for storing the PostgreSQL cluster state
![Page 37: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/37.jpg)
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
![Page 38: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/38.jpg)
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
![Page 39: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/39.jpg)
Cluster state in etcd$ etcdctl ls --recursive /service /service/batman /service/batman/optime /service/batman/optime/leader /service/batman/members /service/batman/members/postgresql0 /service/batman/members/postgresql1 /service/batman/initialize /service/batman/leader
![Page 40: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/40.jpg)
Leader key$ etcdctl get /service/batman/leader postgresql0
• Points to the member key • Has a TTL, autoexpires • Acts as an exclusive lock • Only the leader can become
the master
![Page 41: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/41.jpg)
Leader TTL$ http http://127.0.0.1:2379/v2/keys/service/batman/leader … { "action": "get", "node": { "createdIndex": 48723, "expiration": "2015-10-23T14:51:49.686506977Z", "key": "/service/batman/leader", "modifiedIndex": 49037, "ttl": 27, "value": "postgresql0" } }
![Page 42: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/42.jpg)
Member key$ etcdctl get /service/batman/members/postgresql0
{“role":"master", “state”:"running", “conn_url”:"postgres://replicator:[email protected]:5432/postgres", “api_url”:"http://127.0.0.1:8008/patroni", "xlog_location":67108960}
![Page 43: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/43.jpg)
Connection and API URL
cPatroni
cPatroni
API URL (check health
during promotion)
MASTER
REPLICA
CONNECTION URL
MASTER LB
REPLICA LB
CONNECTION URL
![Page 44: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/44.jpg)
Initialize key$ etcdctl get /service/batman/initialize 6208852353820383446
• PostgreSQL cluster system ID • Created by the first node that
joins the cluster • Nodes with different system
ID are not allowed to join
![Page 45: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/45.jpg)
Patroni modules
ETCD ZOOKEEPER
ABSTRACT DCS PostgreSQL REST API
High availability
Asynchronous executor
Callbacks
![Page 47: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/47.jpg)
From Governor to PatroniGovernor
Patroni
![Page 48: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/48.jpg)
Location of etcd: original
cGovernor
cGovernor
cGovernor
![Page 49: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/49.jpg)
Replace etcd with proxy
cGovernor
cGovernor
cGovernor
Proxy
Proxy
Proxy
![Page 50: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/50.jpg)
Embed etcd client in Patroni
cPatroni
cPatroni
cPatroni
![Page 51: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/51.jpg)
Patroni improvements• Robust exception handling • Run long-running tasks (i.e.
base backup in a separate thread)
• ETCD + Zookeeper • Rest API
![Page 52: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/52.jpg)
Patroni improvements
• Configurable replica imaging
• Support for pg_rewind
![Page 53: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/53.jpg)
Patroni improvements• Manual failover • Initialize from external
cluster • Attach to already running
PostgreSQL nodes • Tags (i.e. nofailover)
![Page 54: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/54.jpg)
What you should monitor• replication lag • unhealthy member • no leader • etcd/
Zookeeper
![Page 55: Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)](https://reader031.fdocument.pub/reader031/viewer/2022020717/586f91061a28ab54768b7ae3/html5/thumbnails/55.jpg)
Thank you!• Spilo:
github.com/zalando/spilospilo.readthedocs.org
• Patroni:github.com/zalando/patronipatroni.readthedocs.org
• Feedback: @alexeyklyukin