You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

MariaDB/Decommissioning a DB Host: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Volans
m (Use the CuminHosts template instead of hardcoding the hostnames)
imported>Ladsgroup
 
Line 33: Line 33:
# <code>sudo puppet-merge</code> - if you see any changes other than yours here, contact the owners to see if these are ok to merge
# <code>sudo puppet-merge</code> - if you see any changes other than yours here, contact the owners to see if these are ok to merge


=== Remove host from tendril and zarcillo ===
=== Remove host from zarcillo ===
# Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from tendril and zarcillo TASKNUMBER
# Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from zarcillo TASKNUMBER
# SSH to one of the cluster management hosts ({{CuminHosts}})
# SSH to one of the cluster management hosts ({{CuminHosts}})
# <code>sudo -i</code>
# <code>sudo -i</code>
# Tendril:
## <code>cd /home/marostegui/git/tendril/bin/</code>
## <code>for i in HOSTNAME(s); do echo $i; ./tendril-host-drop.sh $i.DC.wmnet 3306 ~/.my.cnf.tendril tendril | mysql -h db1115.eqiad.wmnet tendril; done</code> - this can take a while sometimes, depending on how busy tendril is
# Zarcillo
# Zarcillo
## <code>mysql.py -h db1115 -A zarcillo</code>  
## <code>db-mysql db1115 -A zarcillo</code>  
## Execute the following queries in the MySQL prompt (remember about the semicolon):
## Execute the following queries in the MySQL prompt (remember about the semicolon):
### <code>set binlog_format='ROW';</code>
### <code>set binlog_format='ROW';</code>

Latest revision as of 10:38, 8 August 2022

Prerequisites:

  • SSH access to one of the cluster management hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet) to depool + run the decommissioning script
  • SSH access to puppetmaster1001.eqiad.wmnet to merge puppet changes
  • Access to Pwstore
  • Git repositories cloned to your host:

Decommissioning workflow:

Create a tracking ticket

  1. Create a decommission ticket with the following template: https://phabricator.wikimedia.org/maniphest/task/edit/form/52/
  2. If there is hardware problems, please specify so for the DCOps to label it so we do not re-use broken pieces.

Depool the host

  1. SSH to one of the cluster management hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet)
  2. dbctl instance HOSTNAME depool && dbctl config commit -m "Depool db1091 TASKNUMBER"

Remove the host from dbctl

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638343)
  2. SSH to puppetmaster1001
  3. sudo puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge
  4. SSH to one of the cluster management hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet)
  5. dbctl config commit -m "Remove HOSTNAME from dbctl TASKNUMBER"

Remove all other puppet entries

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638352)

Run the decommissioning script

  1. SSH to one of the cluster management hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet)
  2. Start a screen or tmux session
  3. sudo cookbook sre.hosts.decommission -t TASKNUMBER HOSTNAME.DC.wmnet
  4. Enter console password from Pwstore

Merge puppet change

  1. SSH to puppetmaster1001
  2. sudo puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge

Remove host from zarcillo

  1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from zarcillo TASKNUMBER
  2. SSH to one of the cluster management hosts (cumin1001.eqiad.wmnet, cumin2002.codfw.wmnet)
  3. sudo -i
  4. Zarcillo
    1. db-mysql db1115 -A zarcillo
    2. Execute the following queries in the MySQL prompt (remember about the semicolon):
      1. set binlog_format='ROW';
      2. delete from servers where hostname="HOSTNAME";
      3. delete from instances where name='INSTANCE'; (INSTANCE is normally HOSTNAME or HOSTNAME:PORT)
      4. delete from section_instances where instance = 'INSTANCE';

Remove host from orchestrator

  1. From the GUI (admin users only)
  2. From the CLI:
    1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from orchestrator TASKNUMBER
    2. SSH to dborch1001.wikimedia.org
    3. Single-instance host: sudo orchestrator -c forget -i HOSTNAME:3306 (use the FQDN for the HOSTNAME)
    4. Multi-instance host: sudo orchestrator -c forget -i HOSTNAME:PORT for each HOSTNAME:PORT combination (use the FQDN for the HOSTNAME)

Update the task and send it to dcops

  1. mark all the steps for "step for service owners" on: https://phabricator.wikimedia.org/T267088
  2. Reassign:
    • for eqiad to wiki_willy
    • for codfw to papaul
  3. Remove #DBA tag and add #dc-ops and #ops-eqiad OR #ops-codfw.
  4. Add the following comment: "This host is ready for DC-Ops to decommission".



This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)