High Availability Architecture for Legacy Stuff - a 10.000 feet overview
-
Upload
marco-amado -
Category
Internet
-
view
169 -
download
2
Transcript of High Availability Architecture for Legacy Stuff - a 10.000 feet overview
$whoiaintNot a sysadmin (not worthy of the title, at least)
Not a DevOps guru
Not a high availability ninja
Not a scalabilty jedi
Take that into account
Hypothetical Product
Find-a-RhymeGiven a word, the application returns a set of words that rhyme.
You can flter by word class, type of rhyme, word length...
Where we’re standingYe olde LAMP stack
● Commonly found on shared hosting
● Network latency between PHP and DB is amazing – as in zero amazing
● Everything is a single point of failure
● Find-a-rhyme is probably safe, right? Right?
Linux
Apache
MySQL/MariaDB
PHP
Suddenly...
Dictatorship!First order: all written communications should be in verse. And it has to rhyme.
People fock to Find-a-rhyme.
Modern Infantry by LitevCC BY-SA 3.0https://commons.wikimedia.org/wiki/File:Modern_infantry.png
Overview● Load balancing● DB clustering● Sessions● User assets● Single point of failure● Monitoring● Security
HardwarePros
● Faster than software (in general)
● Most have integrated intrusion detection and/or prevention
Cons● Pricey as hell● Confguration not easily
portable
Pros● FOSS (mostly)● Confguration is easy to
reason about
Cons● Can be slow (depending
on machine)● If FOSS, you’re on your
own
Software
frontend web bind find-a-rhyme.com:80 default_backend web
backend web mode http balance leastconn server s1 ip.app1:80 server s2 ip.app2:80
server { listen 80; location / { proxy_pass http://web; }}
upstream web { least_conn; server ip.app1; server ip.app2;}
¯\_(ツ )_/¯
SSL TerminationDo it on the load balancers!
global ca-base /etc/ssl/certs crt-base /etc/ssl/private
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128: DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS ssl-default-bind-options no-sslv3 tune.ssl.default-dh-param 2048
frontend web bind find-a-rhyme.com:80 bind find-a-rhyme.com:443 crt path/to/certificate.pem
MySQL/MariaDB
Replication GroupPros:
● Battle tested
● Big company backed (Oracle)
Cons:
● Confguration is a PITA
XtraDB Cluster & Galera ClusterPretty much the same product
Pros:
● Multi master from the start
● Partners with MariaDB
● Confguration is a breeze
Cons
● Consensus can be a problem
Galera Cluster● Included with MariaDB 10.1● Make sure to also install percona-xtrabackup● A dozen lines of confguration:
[mysqld]binlog_format=ROWdefault-storage-engine=innodbinnodb_autoinc_lock_mode=2bind-address=0.0.0.0wsrep_on=ONwsrep_provider=/usr/lib/galera/libgalera_smm.sowsrep_cluster_name="my_cluster"wsrep_cluster_address="gcomm://ip.db1,ip.db2,ip.db3"wsrep_sst_method=xtrabackup-v2wsrep_sst_auth="sst:somepassword"wsrep_node_address="each.machine.ip"wsrep_node_name="eachMachineName"
HAProxy configuration for DB
backend cluster mode tcp option tcpka option mysql-check user healthUser balance static-rr server db1 ip.db1:3306 check server db2 ip.db2:3306 check server db3 ip.db3:3306 check
frontend cluster bind loadbalancer.ip:3306 default_backend cluster
Change the connection URL in your codebase to this.
This confguration means the application servers must connect to the cluster via load balancers, which in turn connects to the DB servers. Network latency will be an issue.
Session Handling
Sticky sessionsPros:
● Easy confguration on load balancer
Cons:● Bad UX on server fail● Not exactly load
balanced
MemcachedPros:
● Easy confguration on php.ini
Cons:● Install memcached, I
guess?...
Sessions with memcachedEasy confguration on php.ini (or included fles):
session.save_handler = memcachesession.save_path = “tcp://ip.app1,tcp://ip.app2”
memcache.allow_failover = 1memcache.session_redundancy = 3
Number of memcached servers + 1.
It’s an off-by-one bug in PHP, since 2009 (never fxed): https://bugs.php.net/bug.php?id=58585
User assets
CDN● Heavy changes
to codebase● Lack of control● More expenses
Samba, NFS● Single point of
failure● Slow as hell
IPFS
GlusterFS● Distributed fle system● Replicated mode● Transparent operation● Easy CLI confguration:
● fstab confguration:
$ sudo gluster peer probe ip.other.app.server$ sudo gluster volume create volName replica 2 transport tcp ip.app1:/path ip.app2:/path force$ sudo gluster volume start volName$ sudo gluster volume set volName auth.allow ip.app1,ip.app2,127.0.0.1
localhost:/volName /path glusterfs noauto,x-systemd.automount 0 0
KeepalivedImplementation of Virtual Router Redundancy Protocol (VRRP) – in a nutshell, automatic assignment of IP addresses.
● First and foremost, confgure IP forwarding and non-local bind on sysctl.conf:
net/ipv4/ip_forward = 1net/ipv4/ip_nonlocal_bind = 1
“Jumping” IP addresses can be frowned upon by datacenters. Be sure to really talk to your hoster about this.
keepalived.conf (extract)
vrpp_instance VI1 { virtual_router_id 50 # mostly arbitrary – make sure it’s unique interface NIC advert_int 1 state MASTER # BACKUP on the other loadbalancer priority 200 # 100 on the other load balancer unicast_src_ip this.loadbalancer.ip unicast_peer { other.loadbalancer.ip } virtual_ipaddress { your.public.ip dev NIC }}
Virtual IP for DB access
vrpp_instance VI2 { virtual_router_id 60 # mostly arbitrary – make sure it’s unique interface NIC advert_int 1 state MASTER # BACKUP on the other loadbalancer priority 200 # 100 on the other load balancer unicast_src_ip this.loadbalancer.ip unicast_peer { other.loadbalancer.ip } virtual_ipaddress { a.free.private.ip dev NIC }}
Change the connection URL in your codebase to this.
Don’t forget SSL terminationTwo load balancers with failover, two servers where to make SSL termination:
Duplicate your certifcates!
Monit● Monitoring and managment● Can do automatic maintenance and repair● Can execute arbitrary actions on errors● Can monitor system, processes, flesystem,
scripts...
Monit sample configcheck process php with pidfile /var/run/php/php7-fpm.pid start program = ”/usr/bin/service php7-fpm start” stop program = ”/usr/bin/service php7-fpm stop”
if failed unixsocket /var/run/php/php7-fpm.sock then restart
if 2 restarts within 4 cycles then alert
check filesystem disk with path / if space free < 20% then alert
check network private interface eno1 start program = ”/sbin/ifup eno1” stop program = ”/sbin/ifdown eno1”
if failed link for 3 cycles then restart if saturation > 90% for 20 cycles then alert
M/Monit● Aggregate all your Monit instances● Awesome UI – it’s even responsive● Start and stop services from the UI● Analytics, historical data, trend predictions, real-time
charts● Commercial product, but payment is one-time and the
license is perpetual – and it’s cheap, on top*
I’m in no way affliated with M/Monit. Just love the product!*In September 2017, it costs 65€ for 5 monitored hosts, up to 699€ for 1000 hosts.
Keeping it secure(-ish)● As few public IP addresses as possible● Fail2ban● SELinux / AppArmor● No passwordless sudo – ever● Public key SSH● External access through the load balancers:
$ ssh -t [email protected] ssh [email protected]
There’s an app a tool for that● Centralize logs with Elastic Stack (Logstash,
Elasticsearch and Kibana)● Manage the crontab with Crontab UI● DB status and analytics with Cluster Control● Continuous Integration/Deployment
– GitLab is FOSS and self-hosted for greater control
One more thing Two, actually…
Geographic distribuition● Avoid datacenter SPOF● Watch your latency!● Should I say it again?…
Containers● Can be deployed pretty much on demand● Easily switch hosting (ahem… talk to your hoster?)