You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

MariaDB/Switch Datacenter

From Wikitech-static
< MariaDB
Revision as of 13:58, 11 November 2021 by imported>Kormat (Use gerrit: links)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Wikimedia infrastructure

[edit]

The week before the switchover

  • 7 days before: no more maintenance on the database clusters.
  • 6 days before: Enable circular replication between eqiad and codfw.
    • This requires updating section_params in hieradata/common/profile/mariadb.yaml. E.g. gerrit:719168
  • In the new DC:
    • Check and disable GTID on primaries.
    • Check that all replicas have GTID enabled.
    • Check for disabled notifications (icinga)/silences (alertmanager).
    • Check that the query killers are installed and enabled.
    • Review MW weights, comparing them to the old DC.
    • Warm up the caches using queries from the old DC.

The day of the switchover

Before the switchover

  • Downtime all db primaries just before the switch, so that read-only alerts won't fire (T285803).

After the switchover

  • Manually fix parsercache hosts and x2 in tendril: T266723
  • Submit a puppet patch changing host-down alerting:
    • Background: gerrit:736415
    • Move profile::monitoring::is_critical: true from hieradata/role/<old dc>/mariadb/* to hieradata/role/<new dc>/mariadb/
    • Re-run puppet: sudo cumin 'A:db-core or A:db-parsercache' 'run-puppet-agent -q'

After the switchover

  • 2 days after: disable circular replication again, and update section_params in hieradata/common/profile/mariadb.yaml again. E.g. gerrit:721421



This page is a part of the SRE Data Persistence technical documentation
(go here for a list of all our pages)