You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Wikidata Query Service/Streaming Updater Rollout Plan"

From Wikitech-static
Jump to navigation Jump to search
imported>Aklapper
((Please add years to docs))
 
imported>DCausse
(Replaced content with "Moved to phabricator at phab:T288231")
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
{{Template:Draft}}
Moved to phabricator at [[phab:T288231]]
 
Important notes:
* Week of Sept. 13, 2021: planned datacenter switch codfw -> eqiad
* Revision map used to generate the initial states are available on Fridays (7am UTC)
* Dumps should be considered available on Fridays on the mirror
 
 
= General process =
* Notify users on ML&Wiki
* Import
* Switch traffic to eqiad only
* Migrate all machines in codfw
* Switch traffic to codfw (user impact starts)
* Notify users on ML&Wiki (response)
* Migrate all machines in eqiad
* Re-open traffic to both DC
 
= Details =
Before we start (week before):
* query-preview.wikidata.org should be closed
* stop the streaming updater if still running on k8s eqiad/codfw as part of testing k8s
 
Send a message to users on ML&wiki with a estimate once W0 is known.
 
Deployment plan:
* W0:Friday
** depool wdqs2008 and ship a config patch to switch to the streaming-updater-consumer
** start import on wdqs1009 and wdqs2008 with <code>--skolemize</code>: best case 10 days (import from 2 machines to maximize the chances of success)
** start import on wdqs2008: best case 10 days
** from stat1004 generate the initial state to <code>swift://rdf-streaming-updater-eqiad.thanos-swift/wikidata/savepoints/initial_state_$IMPORT_DATE</code> and <code>swift://rdf-streaming-updater-codfw.thanos-swift/wikidata/savepoints/initial_state_$IMPORT_DATE</code>
** start the updater producer on k8s@eqiad and k8s@codfw using the corresponding savepoint (note the time)
* W1: monitor the import and react quickly (the import process is known to be fragile)
* W2:Monday
** if both imports worked start the updater-consumer on wdqs1009 and wdqs2008 (if not automatically started)
** if only one import worked use the data-transfer to ship the data
** wait for the lag to catchup (EST: 1 to 2 days)
* W2:Wednesday
** switch all traffic to eqiad
** start data-transfer + updater-consumer activation wdqs2008 -> all codfw machines (EST: 2 to 3days: 3h/machine*7)
* W3:Monday
** Switch traffic to codfw: '''users are now impacted'''
** Notify users
** Monitor that everything works fine
* W3:Wednesday
** start data-transfer + updater-consumer activation wdqs1009 -> all eqiad machines (EST: 2 to 3days: 3h/machine * 10)
** re-enable eqiad

Latest revision as of 11:58, 15 September 2021

Moved to phabricator at phab:T288231