You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
MariaDB/troubleshooting/external storage failover
< MariaDB | troubleshooting
Jump to navigation
Jump to search
Revision as of 07:45, 14 August 2018 by imported>Jcrespo (saving steps)
- Old master: es1014
- New master: es1017
- Check if there are passwords in the old format / fix grants CHECKED es1017 looks good
- Set expire_logs_days to 30 on the new master DONE
set global expire_logs_days = 30;
- Check pt-heartbeat is running with the latest puppet parameters, meaning (note the user, host and defaults): CHECKED
/usr/bin/perl /usr/local/bin/pt-heartbeat-wikimedia --defaults-file=/dev/null --user=root --host=localhost -D heartbeat --shard={shard} --datacenter={dc} --update --replace --interval=1 --set-vars=binlog_format=STATEMENT -S {socket} --daemonize --pid /var/run/pt-heartbeat.pid
and not with the older format, still present on some masters:
/usr/bin/perl /usr/local/bin/pt-heartbeat-wikimedia --defaults-file=/root/.my.cnf -D heartbeat --shard={shard} --datacenter={dc} --update --replace --interval=1 --set-vars=binlog_format=STATEMENT -S {socket} --daemonize --pid /var/run/pt-heartbeat.pid
- Silence alerts on all hosts DONE
- Disable GTID on es1017 DONE
- Move replicas (es1019, es2017) under the new master (es1017) DONE
- Disable puppet on es1017, es1014 DONE
@es1014> puppet agent --disable "Switching over es3 from es1014 to es1017" @es1017> puppet agent --disable "Switching over es3 from es1014 to es1017"
- merge puppet patch and deploy it: DONE
https://gerrit.wikimedia.org/r/447584
- merge mediawiki patch and rebase on deployment host DONE
https://gerrit.wikimedia.org/r/447586
(actual deployment starts here)
- !log the actions about to take place
!log switchover es3 eqiad master from es1014 to es1017 DONE
- Run switchover script from neodymium
./switchover.py --skip-slave-move es1014 es1017 DONE [Servers sync at master: es1014-bin.002508:184384418 slave: es1017-bin.002491:41215873]
- Deploy mediawiki change (deployment.eqiad.wmnet) DONE
scap sync-file --force wmf-config/db-eqiad.php "Switchover es3 master eqiad from es1014 to es1017"
(main deployment finishes here)
- run puppet on es1014 and es1017, and make sure it doesn't break anything DONE
@es1014> puppet agent --enable && puppet agent -tv @es1017> puppet agent --enable && puppet agent -tv
- Check semisync and gtid status of all related servers DONE
- Make the change reflect on dns CNAME: DONE
https://gerrit.wikimedia.org/r/447587
- Update tendril, zarcillo: DONE
mysql.py -A -h db1115 tendril -e "update shards set master_id=1231 WHERE name='es3' LIMIT 1" mysql.py -A -h db1115 zarcillo -e "UPDATE masters SET instance = 'es1017' WHERE section='es3' and dc = 'eqiad' LIMIT 1"
- Update and close the ticket https://phabricator.wikimedia.org/T197073
- Perform planned maintenance on es1014 (upgrade socket location, upgrade mysql, upgrade kernel, make sure firmware is deployed, change old format passwords if any)
- Remove accounts 'repl'@'10.%' and 'repl'@'208.80.152.%', 'repl'@'10.0.%' maybe others from es1017 (maybe other hosts, too) DONE (es1017 for now)