You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
MariaDB/Upgrading a section: Difference between revisions
< MariaDB
Jump to navigation
Jump to search
imported>LSobanski No edit summary |
imported>Jcrespo (→Upgrade procedure: adding automatic downtime cookbook) |
||
Line 18: | Line 18: | ||
* Run puppet on install1003 and install2003 | * Run puppet on install1003 and install2003 | ||
* Depool the host (if needed) using software/dbtools/depool-and-wait | * Depool the host (if needed) using software/dbtools/depool-and-wait | ||
* Silence the host in Icinga | * Silence the host in Icinga (e.g. on a cumin host, <code>cookbook sre.hosts.downtime xxxx.wmnet -D1 -t TXXXXXX -r "reimage for upgrade - TXXXXXX"</code>) | ||
* Stop MySQL on the host | * Stop MySQL on the host | ||
* Run <code>umount /srv; swapoff -a</code> | * Run <code>umount /srv; swapoff -a</code> | ||
* Run reimage: <code>sudo -E | * Run reimage: <code>sudo -E sudo cookbook sre.hosts.reimage xxxx.wmnet -p TXXXXXX</code> | ||
* Wait until the host is up | * Wait until the host is up | ||
* Run <code>systemctl set-environment MYSQLD_OPTS=”--skip-slave-start”</code> | * Run <code>systemctl set-environment MYSQLD_OPTS=”--skip-slave-start”</code> |
Revision as of 09:20, 11 October 2021
![]() | This is the procedure used for upgrading a section to Buster and MariaDB 10.4. If upgrading to another version, exercise caution (and possibly update this banner) |
![]() | This document assumes that all replicas with no other hosts hanging below, are already upgraded) |
Order of upgrades
- Upgrade clouddb* hosts.
- Upgrade Sanitarium hosts in both DCs
- Upgrade Sanitarium primaries in both DCs and ensure sanitarium host hangs from the 10.4 one in the active DC
- Upgrade the candidate master on the standby DC
- Upgrade the backup source in the standby DC (coordinate with Jaime)
- Upgrade the master in the standby DC
- Upgrade the candidate master in the primary DC
- Upgrade the backup source in the primary DC (coordinate with Jaime)
- Switchover the primary host in the primary DC to a Buster+10.4 host
- Upgrade the old primary and make it a candidate primary
Upgrade procedure
- Patch the dhcp file: [example]
- Run puppet on install1003 and install2003
- Depool the host (if needed) using software/dbtools/depool-and-wait
- Silence the host in Icinga (e.g. on a cumin host,
cookbook sre.hosts.downtime xxxx.wmnet -D1 -t TXXXXXX -r "reimage for upgrade - TXXXXXX"
) - Stop MySQL on the host
- Run
umount /srv; swapoff -a
- Run reimage:
sudo -E sudo cookbook sre.hosts.reimage xxxx.wmnet -p TXXXXXX
- Wait until the host is up
- Run
systemctl set-environment MYSQLD_OPTS=”--skip-slave-start”
- Run
systemctl start mariadb ; mysql_upgrade
- Run
systemctl restart prometheus-mysqld-exporter.service
- Dropped the host from Tendril and re-add it, otherwise they won’t get updated on tendril metrics
- Check all the tables before starting replication (this can take up to 24h depending on the section)
- In a screen run:
mysqlcheck --all-databases
- If any corruption is discovered, fix it with the following:
journalctl -xe -u mariadb | grep table | grep Flagged | awk -F "table" '{print $2}' | awk -F " " '{print $1}' | tr -d "\`" | uniq >> /root/to_fix ; for i in `cat /root/to_fix`; do echo $i; mysql -e "set session sql_log_bin=0; alter table $i engine=InnoDB, force"; done
- In a screen run:
- Start the replica
- Wait until the host is up
- Repool the host.
This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)