You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
MariaDB/Upgrading a section: Difference between revisions
< MariaDB
Jump to navigation
Jump to search
imported>Jcrespo (→Upgrade procedure: adding automatic downtime cookbook) |
imported>Ladsgroup No edit summary |
||
Line 33: | Line 33: | ||
* Wait until the host is up | * Wait until the host is up | ||
* Repool the host. | * Repool the host. | ||
== Upgrading mariadb minor version == | |||
<!-- Needs a bit of clean up --> | |||
<pre> | |||
dbctl instance db1153 depool | |||
dbctl config diff | |||
dbctl config commit -m "Depooling db1153 for mysql upgrade T295026" | |||
mysql.py -hdb1153 -e "show processlist" and check if there is no process using it | |||
cookbook sre.hosts.downtime --hours 2 -r "Maintenance T295026" db1153.eqiad.wmnet | |||
ssh into the host (and become root) | |||
stop slave; | |||
SET GLOBAL innodb_buffer_pool_dump_at_shutdown = OFF; | |||
systemctl stop mariadb | |||
apt full-upgrade | |||
!log Upgrade db1153 T295026 | |||
(if linux kernel got updated as well): | |||
df -hT | |||
umount /srv | |||
reboot | |||
systemctl set-environment MYSQLD_OPTS="--skip-slave-start" | |||
systemctl start mariadb | |||
mysql_upgrade | |||
mysql -e "start slave" (locally) or mysql.py -hdb1153 -e "start slave" from cumin | |||
Wait for it to catch up in replication | |||
(on screen in cumin) | |||
./dbtools/repool db1153 "After upgrade T295026" 50 100 | |||
(for hosts getting traffic: 10 25 75 100) | |||
mark it in https://phabricator.wikimedia.org/T295026 | |||
</pre> | |||
{{SRE/Data Persistence/Footer}} | {{SRE/Data Persistence/Footer}} | ||
[[Category:MariaDB]] | [[Category:MariaDB]] |
Revision as of 18:37, 4 November 2021
![]() | This is the procedure used for upgrading a section to Buster and MariaDB 10.4. If upgrading to another version, exercise caution (and possibly update this banner) |
![]() | This document assumes that all replicas with no other hosts hanging below, are already upgraded) |
Order of upgrades
- Upgrade clouddb* hosts.
- Upgrade Sanitarium hosts in both DCs
- Upgrade Sanitarium primaries in both DCs and ensure sanitarium host hangs from the 10.4 one in the active DC
- Upgrade the candidate master on the standby DC
- Upgrade the backup source in the standby DC (coordinate with Jaime)
- Upgrade the master in the standby DC
- Upgrade the candidate master in the primary DC
- Upgrade the backup source in the primary DC (coordinate with Jaime)
- Switchover the primary host in the primary DC to a Buster+10.4 host
- Upgrade the old primary and make it a candidate primary
Upgrade procedure
- Patch the dhcp file: [example]
- Run puppet on install1003 and install2003
- Depool the host (if needed) using software/dbtools/depool-and-wait
- Silence the host in Icinga (e.g. on a cumin host,
cookbook sre.hosts.downtime xxxx.wmnet -D1 -t TXXXXXX -r "reimage for upgrade - TXXXXXX"
) - Stop MySQL on the host
- Run
umount /srv; swapoff -a
- Run reimage:
sudo -E sudo cookbook sre.hosts.reimage xxxx.wmnet -p TXXXXXX
- Wait until the host is up
- Run
systemctl set-environment MYSQLD_OPTS=”--skip-slave-start”
- Run
systemctl start mariadb ; mysql_upgrade
- Run
systemctl restart prometheus-mysqld-exporter.service
- Dropped the host from Tendril and re-add it, otherwise they won’t get updated on tendril metrics
- Check all the tables before starting replication (this can take up to 24h depending on the section)
- In a screen run:
mysqlcheck --all-databases
- If any corruption is discovered, fix it with the following:
journalctl -xe -u mariadb | grep table | grep Flagged | awk -F "table" '{print $2}' | awk -F " " '{print $1}' | tr -d "\`" | uniq >> /root/to_fix ; for i in `cat /root/to_fix`; do echo $i; mysql -e "set session sql_log_bin=0; alter table $i engine=InnoDB, force"; done
- In a screen run:
- Start the replica
- Wait until the host is up
- Repool the host.
Upgrading mariadb minor version
dbctl instance db1153 depool dbctl config diff dbctl config commit -m "Depooling db1153 for mysql upgrade T295026" mysql.py -hdb1153 -e "show processlist" and check if there is no process using it cookbook sre.hosts.downtime --hours 2 -r "Maintenance T295026" db1153.eqiad.wmnet ssh into the host (and become root) stop slave; SET GLOBAL innodb_buffer_pool_dump_at_shutdown = OFF; systemctl stop mariadb apt full-upgrade !log Upgrade db1153 T295026 (if linux kernel got updated as well): df -hT umount /srv reboot systemctl set-environment MYSQLD_OPTS="--skip-slave-start" systemctl start mariadb mysql_upgrade mysql -e "start slave" (locally) or mysql.py -hdb1153 -e "start slave" from cumin Wait for it to catch up in replication (on screen in cumin) ./dbtools/repool db1153 "After upgrade T295026" 50 100 (for hosts getting traffic: 10 25 75 100) mark it in https://phabricator.wikimedia.org/T295026
This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)