You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org
Application servers (also image/video scalers and job runners)
When rebooting an application server it should be depooled before the reboot.
Restarts of HHVM should be spread out a little, e.g. by waiting 30 seconds between each restart:
salt -b1 'mw1*' cmd.run 'service hhvm restart; sleep 30;'
Cassandra (as used in aqs and restbase)
Cassandra as used in restbase uses a multi-instance setup, i.e. one host runs multiple cassandra processes, typically named "a", "b", etc. For each instance there is a corresponding nodetool-NAME binary that can be used, e.g nodetool-a status -r. The aqs Cassandra cluster doesn't use multi-instance, in that case the name of the tool is simply nodetool (but the commands are equivalent):
Before restarting an instance it is a good idea to drain it first.
nodetool-a drain && systemctl restart cassandra-a nodetool-b drain && systemctl restart cassandra-b
Before proceeding with the next node, you should check whether the restarted node has correctly rejoined the cluster (the name of the tool is relative to the restarted service instance):
nodetool-a status -r
(Directly after the restart the tool might throw an exception "No nodes are present in the cluster". This usually sorts out within a few seconds. If the node has correctlt rejoined the cluster, it should be listed with "UN" prefix, e.g.:
UN xenon-a.eqiad.wmnet 224.65 GB 256 ? 0d691414-4132-4854-a00d-1d2671e15728 rack1
service exim4 restart
The restart should be pre-announced on #wikimedia-operations (for maybe 15 minutes) to give people a headsup:
service gerrit restart
Three of the hadoop workers run an additional JournalNode process. These are configured in the puppet manifest:
- The other Hadoop workers are running two services (hadoop-hdfs-datanode and hadoop-yarn-nodemanager). The services on the Hadoop workers can be restarted in arbitrary orde.
service hadoop-hdfs-datanode restart service hadoop-yarn-nodemanager restart
- TODO: Add notes for JournalNode hosts
One Kafka broker can be restarted/rebooted at a time:
service kafka restart