You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

HAProxy: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Marostegui
(Add how to find out if a proxy is in use or not.)
imported>Dzahn
(fix/adjust path to socket to match what is nowadays in production on dbproxy1017 , dbproxy1021)
Line 37: Line 37:
You can check the status of a proxy by running as root from localhost:
You can check the status of a proxy by running as root from localhost:


  $ echo "show stat" | socat unix-connect:/tmp/haproxy.socket stdio
  $ echo "show stat" | socat unix-connect:/run/haproxy/haproxy.sock stdio
  # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
  # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
  mariadb,FRONTEND,,,0,2,5000,361,60844,537174,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
  mariadb,FRONTEND,,,0,2,5000,361,60844,537174,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
Line 54: Line 54:
if you run <code>sudo systemctl reload haproxy</code>, because the original server has recovered, you will now get:
if you run <code>sudo systemctl reload haproxy</code>, because the original server has recovered, you will now get:


  $ echo "show stat" | socat unix-connect:/tmp/haproxy.socket stdio
  $ echo "show stat" | socat unix-connect:/run/haproxy/haproxy.sock stdio
  # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
  # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
  mariadb,FRONTEND,,,0,1,5000,3,450,789,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
  mariadb,FRONTEND,,,0,1,5000,3,450,789,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
Line 75: Line 75:
== Other interesting commands ==
== Other interesting commands ==


  $ echo "show info" | socat unix-connect:/tmp/haproxy.socket stdio
  $ echo "show info" | socat unix-connect:/run/haproxy/haproxy.sock stdio
  Name: HAProxy
  Name: HAProxy
  Version: 1.5.8
  Version: 1.5.8

Revision as of 01:23, 15 January 2020

There are several services that are load-balanced behind a proxy with HAProxy, at this point mostly databases. HAProxy is a mature, open-source, simple, not-crazily-full-of-features, but reliable and efficient ("just does the job") L3/L4 (tcp/ip) proxy, allowing to be used for both load balancing and automatic switchover.

However, it also provides [some] L7 functionalities for HTTP/HTTPS, and some L7 simple checks for things like MySQL. When more complex checks are needed, typical deployments setup an open ip socket that responds an HTTP code to control the proxy behaviour, for example: https://www.percona.com/doc/percona-xtradb-cluster/5.7/howtos/haproxy.html

Find out if a proxy is being actively used

Not all services are using a proxy as of today (24th Sept 2019), even though they have one ready to be used. To find out which proxies are in use you can run:

host m1-master ; host m2-master ; host m3-master ; host labsdb-analytics ; host labsdb-web

If the result points to a dbproxy, it means it is in use. If it points to a database, it means there is no proxy being used in front of it.

Failover

The typical server is configured like this (/etc/haproxy/conf.d/*). Here it knows about a primary (DB master) a secondary (DB replication slave) but only one node is active at any timeː

listen mariadb 0.0.0.0:3306
   mode tcp
   balance roundrobin
   option tcpka
   option mysql-check user haproxy
   server <%= @primary_name %> <%= @primary_addr %> check inter 3s fall 3 rise 99999999
   server <%= @secondary_name %> <%= @secondary_addr %> check backup

If the primary fails health checks the backup is brought online. The rise 99999999 trick (about 10 years) means that the primary does not come back without human intervention, even if it starts passing HAProxy health checks again. This prevents flopping back and forth once a failure happens, something worse than just

Now, this all sounds good, but there are still some catches:

  • At present many misc slaves are still running read_only=1 so read traffic will fail over but writes will start to be blocked until a human verifies that the old master is properly dead and runs SET GLOBAL read_only=0;. Applications on m2 like gerrit, ieg, otrs, exim and scholarships will complain but remain semi-useful.
  • Persistent connections like those from the eventlogging, bacula, phabricator, etc. did not failover nicely during trials, instead hitting a TCP timeout and causing just about as much annoyance (and backfilling) as having no HAProxy at all. This needs more research.

When a dbproxy complains, it will give a non-critical (will not page) error with:

<icinga-wm> PROBLEM - haproxy failover on dbproxy1010 is CRITICAL: CRITICAL check_failover servers up 1 down 1

You can check the status of a proxy by running as root from localhost:

$ echo "show stat" | socat unix-connect:/run/haproxy/haproxy.sock stdio
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
mariadb,FRONTEND,,,0,2,5000,361,60844,537174,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
mariadb,labsdb1009,0,0,0,2,,226,40594,501669,,0,,0,0,0,0,DOWN,1,1,0,3,1,138,138,,1,2,1,,226,,2,0,,1,L4TOUT,,3000,,,,,,,0,,,,2,0,,,,,138,,,0,0,0,7658,
mariadb,labsdb1010,0,0,0,1,,135,20250,35505,,0,,0,0,0,0,UP,1,0,1,0,0,1543670,0,,1,2,2,,135,,2,1,,1,L7OK,0,0,,,,,,,0,,,,0,0,,,,,1,5.5.5-10.1.19-MariaDB,,0,0,0,1,
mariadb,BACKEND,0,0,0,2,500,361,60844,537174,0,0,,0,0,0,0,UP,1,0,1,,0,1543670,0,,1,2,0,,361,,1,1,,1,,,,,,,,,,,,,,2,0,0,0,0,0,1,,,0,0,0,5882,

Here we see that labsdb1009 has gone down, and labsdb1010 is now the server serving the proxy backend, which is still up.

So for the present, if a dbproxyXXX complains:

  1. Check that original master is really down. If not, restart haproxy on dbproxy1002 and figure out why health checks failed.
  2. If the master is fubar ensure its mysqld is stopped before setting read_only=0 on the slave.
  3. If the slave is fubar most apps probably don't care, so do nothing.

if you run sudo systemctl reload haproxy, because the original server has recovered, you will now get:

$ echo "show stat" | socat unix-connect:/run/haproxy/haproxy.sock stdio
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
mariadb,FRONTEND,,,0,1,5000,3,450,789,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,1,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
mariadb,labsdb1009,0,0,0,1,,3,450,789,,0,,0,0,0,0,UP,1,1,0,0,0,3,0,,1,2,1,,3,,2,1,,1,L7OK,0,0,,,,,,,0,,,,0,0,,,,,1,5.5.5-10.1.19-MariaDB,,0,0,0,1,
mariadb,labsdb1010,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,0,1,0,0,3,0,,1,2,2,,0,,2,0,,0,L7OK,0,0,,,,,,,0,,,,0,0,,,,,-1,5.5.5-10.1.19-MariaDB,,0,0,0,0,
mariadb,BACKEND,0,0,0,1,500,3,450,789,0,0,,0,0,0,0,UP,1,1,1,,0,3,0,,1,2,0,,3,,1,1,,1,,,,,,,,,,,,,,0,0,0,0,0,0,1,,,0,0,0,1,

On IRC:

<icinga-wm> RECOVERY - haproxy failover on dbproxy1010 is OK: OK check_failover

(all servers are back)

Reloading configuration

Two annoying particularities of haproxy:

  • HAProxy, as of this date, doesn't automatically read all configuration files inside `/etc/haproxy/conf.d`, this is workarounded by a preexecution systemd script (generate_haproxy_default.sh that reads all files present there and generates a manual command line on /etc/default/haproxy. This can be misleading if you delete configuration files on puppet but not physically.
  • HAProxy is able to reload (and reset its status) in a clean (and recommended way) by using reload. However, if you change the config file names or its number, you need to restart the service (which is fast, but drops connections. This is a limitation of HAProxy itself and not our puppetization.

Other interesting commands

$ echo "show info" | socat unix-connect:/run/haproxy/haproxy.sock stdio
Name: HAProxy
Version: 1.5.8
Release_date: 2014/10/31
Nbproc: 1
Process_num: 1
Pid: 25297
Uptime: 0d 0h59m29s
Uptime_sec: 3569
Memmax_MB: 0
Ulimit-n: 4033
Maxsock: 4033
Maxconn: 2000
Hard_maxconn: 2000
CurrConns: 0
CumConns: 330
CumReq: 330
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 0
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 0
ConnRateLimit: 0
MaxConnRate: 1
SessRate: 0
SessRateLimit: 0
MaxSessRate: 1
SslRate: 0
SslRateLimit: 0
MaxSslRate: 0
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 0
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 6
Run_queue: 1
Idle_pct: 100
node: dbproxy1010
description: