You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Switch Datacenter/Coordination"

From Wikitech-static
Jump to navigation Jump to search
imported>Legoktm
(→‎Scheduling: use zonestamp)
 
imported>Legoktm
(→‎Scheduling: +spicerack coordination with volans)
 
(One intermediate revision by one other user not shown)
Line 5: Line 5:


* Check the WMF Staff Calendar, global holidays and the [[Deployments/Yearly calendar|deployment yearly calendar]] for potential conflicts.
* Check the WMF Staff Calendar, global holidays and the [[Deployments/Yearly calendar|deployment yearly calendar]] for potential conflicts.
* Ask the DBA, Netops, DCOps, RelEng and CommRel teams to verify the date works with them.
* Ask the DBA, DCOps, RelEng, Network Engineering in Infrastructure Foundations and CommRel teams to verify the date works with them.
** Do this scheduling a kickoff meeting including representatives from the affected teams, where a range of dates can be proposed for the switchover and the switchback. Followup with them and set a final date the next week.
** Do this scheduling a kickoff meeting including representatives from the affected teams, where a range of dates can be proposed for the switchover and the switchback. Followup with them and set a final date the next week.
* Create a Phabricator task (e.g. [[phab:T281515|T281515]]) and update the [[Switch Datacenter]] page with the schedule (use [https://zonestamp.toolforge.org/ zonestamp] links for convenience).
* Create a Phabricator task (e.g. [[phab:T281515|T281515]]) and update the [[Switch Datacenter]] page with the schedule (use [https://zonestamp.toolforge.org/ zonestamp] links for convenience).
Line 11: Line 11:
** Same for the switchback: Services Monday 14:00 UTC, Traffic Monday 15:00 UTC, MediaWiki Tuesday 14:00 UTC
** Same for the switchback: Services Monday 14:00 UTC, Traffic Monday 15:00 UTC, MediaWiki Tuesday 14:00 UTC
*** Typically 6+ weeks later
*** Typically 6+ weeks later
* Announce to sre@wikimedia.org as a tentative date and invite comments and concerns, allow for 1 week of comments
* Announce dates on [[mail:Wikitech-l|wikitech-l]] and [[mail:Ops|ops]] mailing lists.
* Announce dates on [[mail:Wikitech-l|wikitech-l]] and [[mail:Ops|ops]] mailing lists.
* Send calendar invitations to sre{{@}}wikimedia.org.
* Send calendar invitations to sre{{@}}wikimedia.org.
Line 18: Line 19:
2 weeks before the selected date:
2 weeks before the selected date:
* Announce dates on [[mail:Wikitech-l|wikitech-l]] and [[mail:Ops|ops]] mailing lists.
* Announce dates on [[mail:Wikitech-l|wikitech-l]] and [[mail:Ops|ops]] mailing lists.
*Coordinate with [[User:Volans|Volans]] on ensuring any spicerack/wmflib releases are done before they're needed

Latest revision as of 15:57, 9 September 2021

Planning and executing a DC switchover in a non-emergency requires coordinating between various SRE subteams, RelEng, CommRel and others. While we aim to make this a non-event from a user perspective, we're not there yet from an operational perspective.

Scheduling

Ideally this should be started 2 months before the desired date.

  • Check the WMF Staff Calendar, global holidays and the deployment yearly calendar for potential conflicts.
  • Ask the DBA, DCOps, RelEng, Network Engineering in Infrastructure Foundations and CommRel teams to verify the date works with them.
    • Do this scheduling a kickoff meeting including representatives from the affected teams, where a range of dates can be proposed for the switchover and the switchback. Followup with them and set a final date the next week.
  • Create a Phabricator task (e.g. T281515) and update the Switch Datacenter page with the schedule (use zonestamp links for convenience).
    • Typically: Services Monday 14:00 UTC, Traffic Monday 15:00 UTC, MediaWiki Tuesday 14:00 UTC
    • Same for the switchback: Services Monday 14:00 UTC, Traffic Monday 15:00 UTC, MediaWiki Tuesday 14:00 UTC
      • Typically 6+ weeks later
  • Announce to sre@wikimedia.org as a tentative date and invite comments and concerns, allow for 1 week of comments
  • Announce dates on wikitech-l and ops mailing lists.
  • Send calendar invitations to sre at wikimedia.org.
  • Add the date and times in the SRE Monday Update under the Service Interuptions - Any other maintenance and expansions? heading
  • Once the week is listed on the Deployment calendar, add the events there (example)

2 weeks before the selected date:

  • Announce dates on wikitech-l and ops mailing lists.
  • Coordinate with Volans on ensuring any spicerack/wmflib releases are done before they're needed