External storage/Maintenance

How to safety perform maintenance on external storage boxes...

ES boxes are Apaches running an instance of MySQL for some simple blob storage. A couple of things you need to know:

  1. They come in clusters.
    You can check /h/w/php-1.5/db.php for cluster<->machine assignments
  2. Each cluster has a master and one or more slaves.
    The master is the first to appear in the cluster's list in db.php... but for some perverse reason is usually the highest-numbered server (eg, srv146 master, 145 and 144 slaves)
    It should always be safe to take down a slave for maintenance.
    For older, read-only clusters, taking the master down is also safe. Reads will fail-over to the slaves.
  3. Only the last couple clusters are active for writes.
    These are listed in $wgDefaultExternalStore at the end of db.php
    If you're going to shut down the master of one of these clusters, you should remove it from $wgDefaultExternalStore temporarily, otherwise some page saves will fail while it's down.
  4. If MySQL doesn't automatically start when you reboot the machine, punch it manually!
    /etc/init.d/mysqld start should usually do it.

In rarer cases it may be necessary to fix replication or re-clone databases to replace a dead slave. These exercises are left to the reader.