You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "MariaDB/Decommissioning a DB Host"

From Wikitech-static
Jump to navigation Jump to search
imported>LSobanski
imported>LSobanski
Line 83: Line 83:
### <code>delete from instances where name='INSTANCE';</code> (INSTANCE is normally HOSTNAME or HOSTNAME:PORT)
### <code>delete from instances where name='INSTANCE';</code> (INSTANCE is normally HOSTNAME or HOSTNAME:PORT)
### <code>delete from section_instances where instance = 'INSTANCE';</code>
### <code>delete from section_instances where instance = 'INSTANCE';</code>
=== Remove host from orchestrator ===
{{Note| Orchestrator will purge the host automatically within 1-2 weeks but to avoid that delay it should be removed manually}}
# From the GUI (admin users only)
# From the CLI:
## Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from orchestrator TASKNUMBER
## SSH to dborch1001.wikimedia.org
## Single-instance host: <code>sudo orchestrator -c forget -i HOSTNAME:3306</code> (use the FQDN for the HOSTNAME)
## Multi-instance host: <code>sudo orchestrator -c forget -i HOSTNAME:PORT</code> for each HOSTNAME:PORT combination (use the FQDN for the HOSTNAME)


=== Update the task and send it to dcops ===
=== Update the task and send it to dcops ===

Revision as of 10:41, 3 August 2021

Wikimedia infrastructure

[edit]

Prerequisites:

Decommissioning workflow:

Create a tracking ticket

  1. Create a decommission ticket with the following template example: https://phabricator.wikimedia.org/T197063
  2. If there is hardware problems, please specify so for the DCOps to label it so we do not re-use broken pieces.

Depool the host

  1. SSH to cumin1001
  2. dbctl instance HOSTNAME depool && dbctl config commit -m "Depool db1091 TASKNUMBER"

Remove the host from dbctl

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638343)
  2. SSH to puppetmaster1001
  3. sudo -i
  4. puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge
  5. SSH to cumin1001
  6. dbctl config commit -m "Remove HOSTNAME from dbctl TASKNUMBER"

Remove all other puppet entries

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/puppet/+/638352)

[Only codfw hosts, until they are migrated to Netbox] Remove DNS production entries

  1. Create a puppet patch (example: https://gerrit.wikimedia.org/r/c/operations/dns/+/632836)

Run the decommissioning script

  1. SSH to cumin1001
  2. sudo -i
  3. cookbook sre.hosts.decommission -t TASKNUMBER HOSTNAME.DC.wmnet
  4. Enter console password from Pwstore

Merge puppet + DNS changes

  1. SSH to puppetmaster1001
  2. sudo -i
  3. puppet-merge - if you see any changes other than yours here, contact the owners to see if these are ok to merge

Run homer to disable the switch port (required until until https://phabricator.wikimedia.org/T265342 is completed)

  1. Locate the host on netbox and at the end of the page you'll see which switch it is connected to
  2. SSH to cumin1001
  3. run homer and check if the diff looks good for the port number it will be removing from the private vlan and adding to the disabled ones:
> homer asw2-c-eqiad* commit "T268812"
INFO:homer.devices:Initialized 35 devices
INFO:homer:Committing config for query asw2-c-eqiad* with message: T268812
INFO:homer:Gathering global Netbox data
INFO:homer.devices:Matched 1 device(s) for query 'asw2-c-eqiad*'
INFO:homer:Generating configuration for asw2-c-eqiad.mgmt.eqiad.wmnet
Configuration diff for asw2-c-eqiad.mgmt.eqiad.wmnet:

[edit interfaces interface-range disabled]
     member ge-2/0/22 { ... }
+    member ge-2/0/23;
     member ge-2/0/26 { ... }
[edit interfaces interface-range vlan-private1-c-eqiad]
-    member ge-2/0/23;
[edit interfaces]
-   ge-2/0/23 {
-       description "es1016:eno1 {#}";
-   }
  1. Type "yes" to commit, "no" to abort.
> yes
INFO:homer.transports.junos:Committing the configuration on asw2-c-eqiad.mgmt.eqiad.wmnet
INFO:homer:Homer run completed successfully on 1 devices: ['asw2-c-eqiad.mgmt.eqiad.wmnet']

Remove host from tendril and zarcillo

  1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from tendril and zarcillo TASKNUMBER
  2. SSH to cumin1001
  3. sudo -i
  4. Tendril:
    1. cd /home/marostegui/git/tendril/bin/
    2. for i in HOSTNAME(s); do echo $i; ./tendril-host-drop.sh $i.DC.wmnet 3306 ~/.my.cnf.tendril tendril | mysql -h db1115.eqiad.wmnet tendril; done - this can take a while sometimes, depending on how busy tendril is
  5. Zarcillo
    1. mysql.py -h db1115 -A zarcillo
    2. Execute the following queries in the MySQL prompt (remember about the semicolon):
      1. delete from servers where hostname="HOSTNAME";
      2. delete from instances where name='INSTANCE'; (INSTANCE is normally HOSTNAME or HOSTNAME:PORT)
      3. delete from section_instances where instance = 'INSTANCE';

Remove host from orchestrator

  1. From the GUI (admin users only)
  2. From the CLI:
    1. Log the action in IRC (#wikimedia-operations) - !log Removing HOSTNAME from orchestrator TASKNUMBER
    2. SSH to dborch1001.wikimedia.org
    3. Single-instance host: sudo orchestrator -c forget -i HOSTNAME:3306 (use the FQDN for the HOSTNAME)
    4. Multi-instance host: sudo orchestrator -c forget -i HOSTNAME:PORT for each HOSTNAME:PORT combination (use the FQDN for the HOSTNAME)

Update the task and send it to dcops

  1. mark all the steps for "step for service owners" on: https://phabricator.wikimedia.org/T267088
  2. Reassign:
    • for eqiad to wiki_willy
    • for codfw to papaul
  3. Remove #DBA tag and add #dc-ops and #ops-eqiad OR #ops-codfw.
  4. Add the following comment: "This host is ready for DC-Ops to decommission".



This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)