You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(legoktm: reverted cumin1001 spicerack live hacks)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
 
(105 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-06-24 ==
== 2021-10-21 ==
* 23:02 legoktm: reverted cumin1001 spicerack live hacks
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:57 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 23:38 jforrester@deploy1002: Synchronized w/fatal-error.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 23:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (
* 23:37 jforrester@deploy1002: Synchronized w/static.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:36 jforrester@deploy1002: Synchronized multiversion/: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:34 jforrester@deploy1002: Synchronized docroot/noc/conf/index.php: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 54s)
* 23:33 jforrester@deploy1002: Synchronized wmf-config: Config: [[gerrit:730038{{!}}build: Upgrade composer testing stack to latest as used Wikimedia-wide]] (duration: 00m 55s)
* 23:32 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:730946{{!}}CommonSettings: Drop legacy CentralAuth config flag, never read (T277932)]] (duration: 00m 55s)
* 23:18 thcipriani@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:15 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720362{{!}}Add new config names for CentralAuth denylist controls (T277932)]] (duration: 00m 55s)
* 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 mutante: [[phab:T294038|T294038]] [krb1001:~] $ sudo manage_principals.py create effeietsanders ... Principal successfully created.  . .Successfully sent email
* 21:44 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS (duration: 02m 47s)
* 21:41 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@13448f1] (wcqs): Deploy 0.3.90 to WCQS
* 20:54 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 13s)
* 20:53 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:53 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy (duration: 00m 35s)
* 20:52 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@1309a97] (wcqs): dry run wcqs deploy
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:02 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 19:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Update $wgTimelineFonts for new path to unifont in Shellbox container ([[phab:T293050|T293050]]) (duration: 00m 55s)
* 19:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:31 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 19:23 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 19:10 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs (duration: 00m 23s)
* 19:09 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: deploy 0.3.90, incl oauth, to wcqs
* 19:07 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b2912b7]: (no justification provided) (duration: 00m 08s)
* 19:07 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b2912b7]: (no justification provided)
* 18:53 urbanecm: Deploy security patch for [[phab:T285116|T285116]] (wmf.4, wmf.5)
* 18:53 mutante: dumpsdata1003 - sudo systemctl reset-failed to clear Icinga alert about failed cleanup_tmpdumps.service
* 17:55 mutante: that's a key for https://www.worldcat.org/whatis/default.jsp btw for those wondering
* 17:53 mutante: citoid - replaced "wskey" for worldcat in private repo as requested on [[phab:T294010|T294010]] (is in 4 places, 3 for deployment_server/k8s and one remnant for scb)
* 17:53 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 17:50 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 16:12 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:07 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 16:06 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (2/3) (duration: 00m 54s)
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732669{{!}}Remove dispatchViaJobs repo setting (T292604)]] (1/3) (duration: 00m 56s)
* 16:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:01 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:59 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (2/3) (duration: 00m 55s)
* 15:58 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/includes/: Backport: [[gerrit:732668{{!}}Remove dispatchViaJobsPruneChangesTableInJobEnabled repo setting (T292604)]] (1/3) (duration: 00m 57s)
* 15:43 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/tests/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (3/3) (duration: 00m 56s)
* 15:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.5/extensions/Wikibase/repo/config/: Backport: [[gerrit:732667{{!}}Remove dispatchViaJobsAllowedClients repo setting (T292604)]] (1/3) (duration: 00m 54s)
* 15:12


== 2021-06-23 ==
== 2021-09-23 ==
* 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 23:38 foks: running wm-scripts/mcdc2021/populateEditCount.php on each wiki (s1 thru s8 simultaneously) https://phabricator.wikimedia.org/T291668
* 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
* 22:58 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 22:58 foks: creating `mcdc2021_edits` table on each wiki for elections voterlist https://phabricator.wikimedia.org/T291668
* 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
* 22:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
* 22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 22:33 reedy@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/SecurePoll/cli/wm-scripts/: [[phab:T291668|T291668]] (duration: 00m 57s)
* 23:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
* 22:27 ryankemper: [[phab:T280001|T280001]] `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>puppetmaster*<nowiki>}</nowiki>' 'sudo rm -fv /var/run/confd-template/.wcqs*'` complete, forcing recheck
* 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 22:27 ryankemper: [[phab:T280001|T280001]] The pooling of the `wcqs*` hosts has gotten `/srv/config-master/pybal/$<nowiki>{</nowiki>DC<nowiki>}</nowiki>/wcqs` to render, but we need to clear away the stale error files to get rid of the associated warnings `Stale template error files present for '/srv/config-master/pybal/$<nowiki>{</nowiki>DC<nowiki>}</nowiki>/wcqs'` => `sudo rm -fv /var/run/confd-template/.wcqs*`
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 22:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 22:18 ryankemper: [[phab:T280001|T280001]] `ryankemper@puppetmaster1001:/srv$ sudo confctl select 'name=wcqs.*' set/pooled=yes:weight=10`
* 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 22:17 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wcqs.*
* 23:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
* 22:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
* 22:13 ryankemper: [[phab:T280001|T280001]] [codfw] `root@lvs2010:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443` and `root@lvs2009:/home/ryankemper# ipvsadm -Dt 10.2.2.67:443`
* 23:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 22:13 ryankemper: [[phab:T280001|T280001]] [eqiad] `root@lvs1016:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443` and `root@lvs1015:/home/ryankemper# ipvsadm -Dt 10.2.1.67:443`
* 23:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
* 22:06 ryankemper: [[phab:T280001|T280001]] Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2009*,lvs1015*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 23:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
* 22:06 ryankemper: [[phab:T280001|T280001]] Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
* 23:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
* 22:05 ryankemper: [[phab:T280001|T280001]] [Cleanup required] `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` (erroneous)
* 23:42 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
* 22:05 ryankemper: [[phab:T280001|T280001]] [Sanity check] `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n` and `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
* 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
* 22:04 ryankemper: [[phab:T280001|T280001]] Restarted pybal on low-traffic backups: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2010*,lvs1016*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 23:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
* 22:03 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic backups `lvs2010` and `lvs1016`...
* 23:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
* 22:03 ryankemper: [[phab:T280001|T280001]] Ran puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
* 23:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
* 22:00 ryankemper: [[phab:T280001|T280001]] Running puppet on all lvs hosts: `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`...
* 23:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
* 21:59 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/723315, ran puppet agent on `wcqs*` to fix `local lo:LVS destination IPs`
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
* 21:59 ryankemper: [[phab:T280001|T280001]] Swapped the netbox IPAM addresses back, after erroneously swapping them earlier. `sre.dns.netbox` cookbook run complete as well
* 23:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
* 21:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
* 21:53 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
* 21:43 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
* 21:43 foks: altering some rows in the `securepoll_elections` table on metawiki
* 23:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
* 21:36 ryankemper: [[phab:T280001|T280001]] `sre.dns.netbox` run complete, netbox IP mixup *should* be resolved
* 23:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
* 21:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
* 21:27 ryankemper: [[phab:T280001|T280001]] `ryankemper@cumin1001:~$ sudo -i cookbook sre.dns.netbox -t [[phab:T280001|T280001]] 'Fix swapped wcqs.svc.[eqiad,codfw].wmnet'` in progress (note: no `sudo authdns-update` will be necessary because that's just for `operations/dns` repo changes; we only need to run the netbox cookbook)
* 23:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
* 21:24 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 23:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
* 21:23 ryankemper: [[phab:T280001|T280001]] Swapped IPs of https://netbox.wikimedia.org/ipam/ip-addresses/9062/ and https://netbox.wikimedia.org/ipam/ip-addresses/9063; this should fix the issue where eqiad and codfw were swapped in netbox (my error)...still need to run netbox cookbook and possibly a manual `sudo authdns-update`
* 23:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 21:19 ryankemper: The pybal side of the changes looks good, but I made a mistake with the assigning of IPs in netbox; `wcqs.svc.eqiad.wmnet` is routing to where codfw should go and vice versa. Fixing...
* 23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
* 21:05 ryankemper: [[phab:T280001|T280001]] Restarted pybal on low-traffic primaries: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2009*,lvs1015*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 23:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 21:04 ryankemper: [[phab:T280001|T280001]] Restarting pybal on low-traffic primaries `lvs2009` and `lvs1015`...
* 23:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 21:04 ryankemper: [[phab:T280001|T280001]] Waited 120s and checked https://icinga.wikimedia.org/alerts, proceeding to primary low-traffic hosts `lvs2009` and `lvs1015`
* 23:22 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
* 21:00 ryankemper: [[phab:T280001|T280001]] Sanity check of `sudo ipvsadm -L -n` on low-traffic backups `lvs2010` and `lvs1016` looks good, proceeding
* 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
* 21:00 ryankemper: [[phab:T280001|T280001]] `TCP  10.2.1.67:443 wrr` shows up on `ryankemper@lvs1016:~$ sudo ipvsadm -L -n ` and `TCP  10.2.2.67:443 wrr` shows up on `ryankemper@lvs2010:~$ sudo ipvsadm -L -n` as expected
* 23:21 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 20:58 brennen: canceling backport training window for 2021-09-23
* 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
* 20:54 ryankemper: [[phab:T280001|T280001]] Restarted pybal on backup low-traffic hosts: `ryankemper@cumin1001:~$ sudo cumin 'P<nowiki>{</nowiki>lvs2010*,lvs1016*<nowiki>}</nowiki>' 'sudo systemctl restart pybal'`
* 23:19 dduvall: rolling back 1.37.0-wmf.11 from group1 ([[phab:T281152|T281152]]) due to reoccurrence of "PHP Notice: Undefined index: frameCount" now at PNGHandler.php:156 ([[phab:T285431|T285431]])
* 20:53 ryankemper: [[phab:T280001|T280001]] Restarting pybal on backup low-traffic hosts `lvs2010` and `lvs1016`...
* 23:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
* 20:53 ryankemper: [[phab:T280001|T280001]] Ran puppet on all lvs hosts => `ryankemper@cumin1001:~$ sudo cumin 'O:lvs::balancer' 'sudo run-puppet-agent'`
* 23:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
* 20:47 ryankemper: [[phab:T280001|T280001]] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/723254 to proceed with `lvs_setup` state change; will be restarting low-traffic lvs hosts shortly
* 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
* 20:04 dduvall: 1.38.0-wmf.1 promoted to all wikis. no new errors or rising rates ([[phab:T281165|T281165]])
* 23:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
* 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
* 19:50 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.1
* 23:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
* 19:40 kostajh: UTC morning backport window done
* 23:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
* 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
* 19:39 kharlan@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:723194{{!}}Suggested Edits: Update editor preference for tasks that shouldn't open the editor by default (T291020)]] (duration: 01m 05s)
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:10 dduvall: re-rolling group1 to 1.37.0-wmf.11 ([[phab:T281152|T281152]]) following deployment of blocker fixes
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
* 19:02 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I3323ce3d4446a2}} (duration: 01m 07s)
* 23:05 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/GIFHandler.php: Backport: [[gerrit:701187{{!}}Check for _error in getting metadata array in GIFHandler (T285431)]] (duration: 01m 06s)
* 18:58 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/721089 to see if it resolves the `confd` error that popped up
* 22:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/PNGHandler.php: Backport: [[gerrit:701186{{!}}Check for _error in getting metadata array in PNGHandler (T285431)]] (duration: 01m 06s)
* 18:57 krinkle@deploy1002: Synchronized wmf-config/logging.php: {{Gerrit|I2cd81a5165ea14c}} (duration: 01m 05s)
* 22:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
* 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:45 sbassett: Deployed updated security patch for [[phab:T285190|T285190]] to wmf.9 and wmf.11
* 18:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:55 ejegg: updated payments-wiki from {{Gerrit|42cfbe832d}} to {{Gerrit|d9892207c1}}
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:38 eileen: civicrm revision changed from {{Gerrit|53d103f672}} to {{Gerrit|bd906975f0}}, config revision is {{Gerrit|6a88618c3e}}
* 17:31 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 20:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
* 17:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:39 dduvall: rolling back wmf.11 from group1 due to increase in logspam possibly related to noted risky patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 (cc [[phab:T281152|T281152]] and patch contact Amir1)
* 17:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:35 herron: rebooting kafkamon hosts for updates
* 17:06 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 19:26 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
* 17:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
* 16:59 volans: uploaded spicerack_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 19:20 dduvall: preparing to promote wmf.11 group1 ([[phab:T281152|T281152]]) cc'ing risky patch contacts Amir1, Krinkle, DannyS712
* 16:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6e0f5ad88cb7d99e3b4cf48bccb6e34cdcc64fa5}}: Enable GrowthExperiments donor landing page for testing ([[phab:T284799|T284799]]) (duration: 01m 05s)
* 16:38 ryankemper: [[phab:T280001|T280001]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/713959, running puppet on `*w*qs*` (i.e. wcqs and wdqs)
* 19:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: {{Gerrit|2338e53}}: Revert "Add custom signup flow for donors" ([[phab:T284740|T284740]]; [[phab:T284800|T284800]]; [[phab:T285281|T285281]]) (duration: 01m 06s)
* 16:13 elukey: reboot an-worker1096 to see if megacli status for a new disk changes - [[phab:T290805|T290805]]
* 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:09 brennen: gitlab1001: reverting [[gerrit:714382{{!}}gitlab cas: uid instead of CN; add nickname_key]] for [[phab:T288392|T288392]], as existing user logins are broken.
* 18:55 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: REVERT: {{Gerrit|76e5fc91083736d14049a05ed227cdea015c113e}}: Add custom signup flow for donors ([[phab:T284740|T284740]]; [[phab:T284800|T284800]]; [[phab:T285281|T285281]]) (duration: 00m 38s)
* 15:54 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder/' {{!}} mwscript purgeList.php # [[phab:T285761|T285761]]
* 18:55 urbanecm@deploy1002: sync-file aborted: REVERT: {{Gerrit|76e5fc91083736d14049a05ed227cdea015c113e}}: Add custom signup flow for donors ([[phab:T284740|T284740]]; [[phab:T284800|T284800]]; [[phab:T285281|T285281]]) (duration: 00m 01s)
* 15:54 brennen: gitlab1001: brief downtime to apply [[gerrit:714382{{!}}gitlab cas: uid instead of CN; add nickname_key]] for [[phab:T288392|T288392]]
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:54 urbanecm@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:53 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: {{Gerrit|76e5fc91083736d14049a05ed227cdea015c113e}}: Add custom signup flow for donors ([[phab:T284740|T284740]]; [[phab:T284800|T284800]]; [[phab:T285281|T285281]]) (duration: 01m 07s)
* 15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 18:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/WikimediaEvents/extension.json: {{Gerrit|01f034b466ff7bdd274e18c9ad7cefe88245548d}}: Finalize WMDEBanner* schema migration to Event Platform ([[phab:T282562|T282562]]) (duration: 01m 05s)
* 15:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: {{Gerrit|17efbafc300c6745928415d7a10f2dad8f406de4}}: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki ([[phab:T279886|T279886]]; [[phab:T285385|T285385]]) (duration: 01m 06s)
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3a2fc6e3687817db5c774e7e527a10dd9e974138}}: Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites (duration: 01m 05s)
* 15:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 18:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try (duration: 09m 11s)
* 14:58 reedy@deploy1002: Synchronized wmf-config/reverse-proxy-staging.php: [[phab:T291643|T291643]] (duration: 01m 05s)
* 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try
* 14:19 moritzm: removed routers filter for mx1001, reimage to bullseye complete [[phab:T286911|T286911]]
* 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b4a786799a249d2012b9c47553a0b64fdce1bac0}}: Make Growth features available to newcomers at lvwiki and skwiki ([[phab:T278191|T278191]]; [[phab:T284149|T284149]]) (duration: 01m 06s)
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 17:58 herron: beginning rolling reboots of kafka-main100[1-5] for updates
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 17:57 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for NavigationTiming ext streams - [[phab:T271208|T271208]], [[phab:T266798|T266798]] (duration: 01m 29s)
* 14:14 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:07 herron: beginning rolling reboots of kafka-main200[1-5] for updates
* 13:53 effie: upgrade php7.2 on codfw - [[phab:T291052|T291052]]
* 16:42 XioNoX: re-start sending traffic on the codfw-eqsin Telia transport link
* 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 15:17 topranks: Removing peering to AS64050 / "BGP Consultancy Pte Ltd" at AMS-IX (cr2-esams).  Peer has left IX.
* 13:36 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:54 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s1
* 13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 14:53 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s8
* 13:34 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx1001.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 13:54 effie: rolling restart thanos-fe* to pick up new tegola-vector-tiles account - [[phab:T283049|T283049]]
* 13:28 marostegui: Deploy schema change on s8 codfw wikidatawiki.wb_changes [[phab:T291584|T291584]]
* 13:45 volans: uploaded cumin_4.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:27 moritzm: reimaging mx1001 to bullseye [[phab:T286911|T286911]]
* 13:27 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s4
* 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: reimage
* 12:59 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s3
* 13:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: reimage
* 12:46 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s7
* 13:23 jbond: merge refactor of resolv.conf puppet class - (gerrit 717241)
* 12:35 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s6
* 13:14 marostegui: Deploy schema change on s4 <nowiki>{</nowiki>commonswiki,testcommonswiki<nowiki>}</nowiki>.wb_changes [[phab:T291584|T291584]]
* 12:26 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # [[phab:T170737|T170737]], SHARD=s5
* 13:11 marostegui: Deploy schema change on s3 testwikidatawiki.wb_changes [[phab:T291584|T291584]]
* 12:15 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # [[phab:T170737|T170737]]
* 13:09 elukey: update pcc facts (after change in puppetdb's fact filter list, to allow partitions for analytics)
* 11:46 XioNoX: Simplify labs-in4/6 firewall filters - CR700939
* 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 topranks: Removing peering to AS39651 / "Com Hem AB" at AMS-IX (cr2-esams).  Peer has left IX.
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 marostegui: Upgrade db2081 db2082 db2083 db2084 db2091 db2152 [[phab:T290868|T290868]]
* 09:35 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 20s)
* 11:16 kostajh: UTC morning backport and config deploys done
* 09:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
* 11:15 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:722961{{!}}GrowthExperiments: Place new dewiki accounts in control group (T288420)]] (duration: 01m 06s)
* 09:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:10 jynus: restart and upgrade db2141 [[phab:T290865|T290865]]
* 08:48 volans: sudo systemctl start ferm.service on thanos-fe2002 (DNS query timeout)
* 10:55 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 08:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 14s)
* 10:53 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye [[phab:T286911|T286911]]
* 08:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
* 10:52 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 07:57 kart_: cxserver: Removed Matxin MT support and added more language support to Elia MT ([[phab:T285199|T285199]],  [[phab:T284900|T284900]])
* 10:51 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 07:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:50 marostegui: Upgrade db2102 db2116 db2130 db2145 db2146
* 07:49 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:47 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 07:46 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 07:26 legoktm: uploaded mailman3_3.3.3-1~bpo10+6_amd64.changes on apt1001
* 09:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:08 legoktm: updating mailman packages on lists1001 and restarting ([[phab:T285120|T285120]], [[phab:T280889|T280889]])
* 09:55 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo pool`
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 06:37 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo pool`
* 09:52 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16703 and previous config saved to /var/cache/conftool/dbconfig/20210623-062819-root.json
* 09:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16702 and previous config saved to /var/cache/conftool/dbconfig/20210623-061316-root.json
* 09:40 moritzm: reinstalling mx2002 (test server) to validate bullseye installs are fixed
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16701 and previous config saved to /var/cache/conftool/dbconfig/20210623-055812-root.json
* 09:31 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1100', diff saved to https://phabricator.wikimedia.org/P16700 and previous config saved to /var/cache/conftool/dbconfig/20210623-054252-marostegui.json
* 09:30 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16699 and previous config saved to /var/cache/conftool/dbconfig/20210623-045217-root.json
* 09:29 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 01:04 eileen: process-control config revision is {{Gerrit|6a88618c3e}}
* 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:50 eileen: civicrm revision changed from {{Gerrit|c745d4f075}} to {{Gerrit|03bead707d}}, config revision is {{Gerrit|4ab72c1033}}
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:40 legoktm: uploaded new versions of flufl.bounce_4.0-1_amd64.changes hyperkitty_1.3.4-2~bpo10+4_amd64.changes mailman3_3.3.3-1~bpo10+5_amd64.changes mailman-hyperkitty_1.1.0-10~bpo10+1_amd64.changes to apt1001
* 08:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:02 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T284185|T284185]])
* 08:04 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (2/2) ([[phab:T289227|T289227]]) (duration: 01m 05s)
* 08:02 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Have SyntaxHighlight use Shellbox service on group0 wikis (1/2) ([[phab:T289227|T289227]]) (duration: 01m 06s)
* 08:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:54 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (3/3) (duration: 01m 05s)
* 07:52 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (2/3) (duration: 01m 05s)
* 07:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename $wmgUseGeSHi to $wmgUseSyntaxHighlight (1/3) (duration: 01m 06s)
* 07:10 tgr: running `mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=$WIKI --search-index --db-table --statsd` for growthexperiments.dblist wikis
* 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 06:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 06:56 marostegui: Upgrade db2116
* 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
* 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
* 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:53 marostegui: Upgrade db2085, db2088 and db2092
* 05:24 marostegui: Optimize ruwiki.logging on codfw [[phab:T286102|T286102]]
* 02:55 eileen: civicrm revision changed from {{Gerrit|14658445a2}} to {{Gerrit|18228490ae}}, config revision is {{Gerrit|77cb7ec866}}
* 02:06 RoanKattouw: Deployed patch for [[phab:T291600|T291600]]
* 01:05 eileen: tools revision changed from {{Gerrit|1d67c52c12}} to {{Gerrit|d90f4c91ee}}
* 00:35 catrope@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/MediaSearch/: Use text() instead of parse() for MediaSearch UI messages ([[phab:T291590|T291590]]) (duration: 01m 08s)
* 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-06-22 ==
== 2021-09-22 ==
* 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for search event streams (duration: 01m 05s)
* 22:51 mutante: mx2001 - re-enabled puppet
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7865f27430f8eea2975d7154f6009a9206fc75d6}}: Add unwatchedpages to rollbacker on frwiki ([[phab:T285334|T285334]]) (duration: 01m 06s)
* 20:48 ryankemper: [WDQS] After puppet-merging, running puppet on `miscweb*`, and doing a `ryankemper@mwmaint1002:~$ echo 'https://query.wikidata.org/querybuilder' {{!}} mwscript purgeList.php`, https://query.wikidata.org/querybuilder is working properly again
* 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a594f0ce249e2b4752ea2b8d7c4258bf14ad86a}}: Enable Growth features in dark mode at nlwiki ([[phab:T285254|T285254]]; 3/3) (duration: 01m 07s)
* 20:39 ryankemper: [WDQS] Merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/722958/ which should (hopefully) resolve an issue where https://query.wikidata.org/querybuilder gives a 404, whereas https://query.wikidata.org/querybuilder/ works (due to the trailing slash avoiding the rewrite regex)
* 23:05 urbanecm@deploy1002: Synchronized wmf-config/config/nlwiki.yaml: {{Gerrit|9a594f0ce249e2b4752ea2b8d7c4258bf14ad86a}}: Enable Growth features in dark mode at nlwiki ([[phab:T285254|T285254]]; 2/3) (duration: 01m 05s)
* 20:38 ryankemper: `[WCQS]` `wcqs1001.eqiad.wmnet` is reachable again following the powercycle
* 23:04 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|9a594f0ce249e2b4752ea2b8d7c4258bf14ad86a}}: Enable Growth features in dark mode at nlwiki ([[phab:T285254|T285254]]; 1/3) (duration: 01m 37s)
* 20:20 ryankemper: `[WCQS]` Ran `racadm>>racadm serveraction powercycle` on `wcqs1001.mgmt.eqiad.wmnet`
* 22:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=subcats # [[phab:T170737|T170737]]
* 20:18 ryankemper: `[WCQS]` `wcqs1001` is ssh unreachable (https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wcqs1001&service=SSH), will try restarting from mgmt console
* 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=pages # [[phab:T170737|T170737]]
* 19:29 dduvall: 1.38.0-wmf.1 promoted to group1. no new errors or rising error rates ([[phab:T281165|T281165]])
* 22:38 urbanecm: mwscript recountCategories.php --wiki=eowiktionary --mode=<nowiki>{</nowiki>pages,subcats,files<nowiki>}</nowiki> ([[phab:T170737|T170737]])
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:05 eileen: civicrm revision changed from {{Gerrit|629bd3b7b7}} to {{Gerrit|c745d4f075}}, config revision is {{Gerrit|4ab72c1033}}
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:05 ejegg: updated payments-wiki from {{Gerrit|7be0534b91}} to {{Gerrit|42cfbe832d}}
* 19:20 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.1 (duration: 01m 11s)
* 20:46 brennen: gitlab1001: running ansible to deploy [[gerrit:699819{{!}}CAS: stop marking users as external]] ([[phab:T274461|T274461]])
* 19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.1
* 20:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
* 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
* 19:11 dduvall@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/CentralAuth: Backport: [[gerrit:722896{{!}}Avoid $wgUser deprecation warnings (T291515)]] (duration: 01m 06s)
* 20:12 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T284185|T284185]])
* 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:12 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T284185|T284185]])
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:58 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/699812 ([[phab:T264231|T264231]])
* 18:32 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/GrowthExperiments/modules/help/ext.growthExperiments.PostEditPanel.js: Post-edit Panel: Set task.pageviews to null rather than undefined ([[phab:T291510|T291510]]) (duration: 01m 05s)
* 19:26 legoktm: set mediawiki-l message acceptance to discard non-member posts instead of reject
* 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.11
* 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:06 dduvall: preparing to promote wmf.11 group0 ([[phab:T281152|T281152]]) cc'ing risking patch contacts Amir1, Krinkle, DannyS712
* 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 dduvall@deploy1002: Pruned MediaWiki: 1.37.0-wmf.6 (duration: 03m 35s)
* 18:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs (duration: 04m 23s)
* 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: logging: send DuplicateParse bucket to Logstash (duration: 01m 05s)
* 18:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
* 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add new Shellboxes (duration: 01m 16s)
* 18:30 awight@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor: Backport: [[gerrit:700734{{!}}Revert "Fall back from explicit parameter order to TemplateData sort" ()]] (duration: 01m 09s)
* 18:03 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 18:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
* 17:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
* 17:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:27 awight@deploy1002: sync-file aborted: Backport: [[gerrit:700734{{!}}Revert "Fall back from explicit parameter order to TemplateData sort" ()]] (duration: 00m 40s)
* 17:38 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 18:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
* 17:38 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/: Restore deprecated API token methods (3/3) (duration: 01m 07s)
* 18:19 legoktm: pulled in updates for thirdparty/kubeadm-k8s-1-18 buster-wikimedia on apt1001
* 17:36 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/autoload.php: Restore deprecated API token methods (2/3) (duration: 01m 05s)
* 17:47 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/700851 ([[phab:T274463|T274463]])
* 17:34 legoktm@deploy1002: Synchronized php-1.38.0-wmf.1/includes/api/ApiTokens.php: Restore deprecated API token methods (1/3) (duration: 01m 05s)
* 17:43 dduvall: testwikis to 1.37.0-wmf.11 (cc open blockers [[phab:T285125|T285125]] [[phab:T285118|T285118]] [[phab:T271011|T271011]])
* 16:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 17:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.11 (duration: 30m 59s)
* 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:21 moritzm: installing isc-dhcp security updates
* 16:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 17:18 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:53 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
* 17:14 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:50 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove wmgFileBlacklist (duration: 01m 06s)
* 17:11 moritzm: installing ruby-websocket-extensions security updates
* 16:49 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46] (duration: 06m 17s)
* 17:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.11
* 16:48 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgProhibitedFileExtensions (duration: 01m 05s)
* 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:07 moritzm: installing velocity security updates
* 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 16:45 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgProhibitedFileExtensions (duration: 01m 07s)
* 17:04 dduvall: 1.37.0-wmf.11 was branched at {{Gerrit|c161d3bd063b06d09be4167b38a72087db3ba7d2}} for [[phab:T281152|T281152]]
* 16:43 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@04aae46]
* 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:41 mutante: [netmon1002:~] $ sudo systemctl start rancid-differ
* 16:41 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic ([[phab:T284185|T284185]])
* 16:41 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename wgShortPagesNamespaceBlacklist to wgShortPagesNamespaceExclusions (duration: 01m 05s)
* 14:57 dcausse@deploy1002: Finished deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74 (duration: 13m 26s)
* 16:40 mutante: [netmon1002:~] $ sudo systemctl start rancid-clean-logs
* 14:43 dcausse@deploy1002: Started deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74
* 16:39 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Rename wgEnableUserEmailBlacklist to wgEnableUserEmailMuteList (duration: 01m 05s)
* 14:37 XioNoX: start updating analytics firewall rules to capirca generated ones on cr2-eqiad - [[phab:T279429|T279429]]
* 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:35 hoo: Updated the Wikidata property suggester with data from the 2021-05-31 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 16:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:01 XioNoX: start updating analytics firewall rules to capirca generated ones on cr1-eqiad - [[phab:T279429|T279429]]
* 16:37 joal@deploy1002: Finished deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46] (duration: 00m 07s)
* 13:49 kormat: disabling puppet on A:db-all for [[phab:T285079|T285079]]
* 16:37 joal@deploy1002: Started deploy [analytics/refinery@04aae46] (thin): Regular analytics weekly train THIN [analytics/refinery@04aae46]
* 13:38 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=nlwiki --phab=[[phab:T285254|T285254]] # [[phab:T285254|T285254]]
* 16:36 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 13:37 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=nlwiki growthexperiments # [[phab:T285254|T285254]]
* 16:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:700905{{!}}Correctly enable Vector language switcher treatment A/B test (T269093)]] (duration: 00m 57s)
* 16:35 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wgMimeTypeExclusions and set wgProhibitedFileExtensions not wgFileBlacklist (duration: 01m 05s)
* 13:29 urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # [[phab:T266913|T266913]]
* 16:32 joal@deploy1002: Finished deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46] (duration: 18m 19s)
* 13:29 Trey314159: reindexing German wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete ([[phab:T284185|T284185]])
* 16:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:04 Lucas_WMDE: backport+config window done
* 16:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:03 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:700705{{!}}Enable new Vector Languages-in-header feature & AB test for pilot wikis (T269093)]] (duration: 00m 56s)
* 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:58 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in [[phab:T157030|T157030]] and similar tasks
* 16:14 joal@deploy1002: Started deploy [analytics/refinery@04aae46]: Regular analytics weekly train [analytics/refinery@04aae46]
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/: Backport: [[gerrit:700730{{!}}launchULS: Add context to interface.language.change hook (T280770)]] (duration: 00m 57s)
* 16:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:722916{{!}}Set jQuery migrate to false everywhere except metawiki (T280944)]] (duration: 01m 56s)
* 11:35 moritzm: installing fluidsynth security updates
* 16:08 volans@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host sretest1002.eqiad.wmnet
* 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698041{{!}}enwiki: Remove 'collectionsaveascommunitypage' from the 'autoconfirmed' user group (T283523)]] (duration: 00m 56s)
* 15:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16691 and previous config saved to /var/cache/conftool/dbconfig/20210622-110619-kormat.json
* 15:56 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f] (duration: 06m 17s)
* 10:51 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16690 and previous config saved to /var/cache/conftool/dbconfig/20210622-105115-kormat.json
* 15:52 moritzm: removed filters on mx1001 filterered on the routers due to an issue with the mx1001 reinstall [[phab:T286911|T286911]]
* 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16689 and previous config saved to /var/cache/conftool/dbconfig/20210622-103612-kormat.json
* 15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (hadoop-test): Bugfix analytics deploy TEST [analytics/refinery@b2ca54f]
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16688 and previous config saved to /var/cache/conftool/dbconfig/20210622-102108-kormat.json
* 15:49 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f] (duration: 00m 07s)
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16687 and previous config saved to /var/cache/conftool/dbconfig/20210622-094019-root.json
* 15:49 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f] (thin): Bugfix analytics deploy THIN [analytics/refinery@b2ca54f]
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16686 and previous config saved to /var/cache/conftool/dbconfig/20210622-092515-root.json
* 15:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node" (duration: 00m 15s)
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16685 and previous config saved to /var/cache/conftool/dbconfig/20210622-092056-root.json
* 15:15 mbsantos@deploy1002: Started deploy [kartotherian/deploy@7ed9c3b]: Revert "change tegola uri to test single production node"
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16684 and previous config saved to /var/cache/conftool/dbconfig/20210622-091012-root.json
* 15:02 moritzm: re-installing mx1001 with bullseye [[phab:T286911|T286911]]
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16683 and previous config saved to /var/cache/conftool/dbconfig/20210622-090552-root.json
* 14:47 volans: upgraded spicerack to 1.0.0 on cumin hosts
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16682 and previous config saved to /var/cache/conftool/dbconfig/20210622-085508-root.json
* 14:14 volans: uploaded spicerack_1.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16681 and previous config saved to /var/cache/conftool/dbconfig/20210622-085049-root.json
* 13:39 herron: flushed mx1001 mail queue to mx2001 [[phab:T286911|T286911]]
* 08:49 marostegui: Upgrade db1166
* 13:26 moritzm: mx1001 filterered on the routers for forthcoming reimage to bullseye [[phab:T286911|T286911]]
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16680 and previous config saved to /var/cache/conftool/dbconfig/20210622-084915-marostegui.json
* 13:23 joal@deploy1002: Finished deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f] (duration: 18m 25s)
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16679 and previous config saved to /var/cache/conftool/dbconfig/20210622-083545-root.json
* 13:09 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10% (duration: 00m 14s)
* 07:53 joe: uploaded wmf-certificates package to buster-wikimedia/main, [[phab:T284417|T284417]]
* 13:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3293ce1]: tegola: increase mirrored requests to 10%
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 [[phab:T283499|T283499]]', diff saved to https://phabricator.wikimedia.org/P16678 and previous config saved to /var/cache/conftool/dbconfig/20210622-072828-marostegui.json
* 13:04 joal@deploy1002: Started deploy [analytics/refinery@b2ca54f]: Bugfix analytics deploy [analytics/refinery@b2ca54f]
* 06:43 dcausse: repool wdqs1005
* 12:56 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5% (duration: 00m 15s)
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
* 12:55 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5617839]: tegola: increase mirrored requests to 5%
* 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
* 12:46 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node (duration: 00m 14s)
* 05:06 marostegui: Stop replication on old s5 master ( db1100) - [[phab:T284529|T284529]]
* 12:46 mbsantos@deploy1002: Started deploy [kartotherian/deploy@8765218]: change tegola uri to test single production node
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool old master running 10.1 [[phab:T284529|T284529]]', diff saved to https://phabricator.wikimedia.org/P16677 and previous config saved to /var/cache/conftool/dbconfig/20210622-050602-marostegui.json
* 11:46 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 master and set section read-write [[phab:T284529|T284529]]', diff saved to https://phabricator.wikimedia.org/P16676 and previous config saved to /var/cache/conftool/dbconfig/20210622-050123-root.json
* 11:38 jbond: enable puppet fleet wide to post puppetdb restart
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - [[phab:T284529|T284529]]', diff saved to https://phabricator.wikimedia.org/P16675 and previous config saved to /var/cache/conftool/dbconfig/20210622-050036-root.json
* 11:33 jbond: disable puppet fleet wide to preforme puppdb restart
* 05:00 marostegui: Starting s5 eqiad failover from db1100 to db1130 - [[phab:T284529|T284529]]
* 11:11 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:20 marostegui: Start topology changes for s5 switchover [[phab:T284529|T284529]]
* 10:50 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s5 [[phab:T284529|T284529]]
* 10:31 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s5 [[phab:T284529|T284529]]
* 10:20 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:11 eileen: process-control config revision is {{Gerrit|4ab72c1033}}
* 09:51 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 01:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
* 09:38 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
* 08:46 effie: upgrade php7.2 on api-canaries and restart service - [[phab:T291052|T291052]]
* 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE
* 06:02 elukey: update pcc facts
* 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE
* 05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-syntaxhighlight
* 05:48 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
* 05:47 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-media
* 05:31 legoktm: restarting pybal on lvs2009
* 05:27 legoktm: restarting pybal on lvs2010
* 05:23 legoktm: restarting pybal on lvs1015
* 05:17 legoktm: restarting pybal on lvs1016
* 05:12 legoktm: sudo cumin 'O:lvs::balancer' 'run-puppet-agent'
* 04:48 legoktm: ran authdns-update for adding new shellbox svc entries https://gerrit.wikimedia.org/r/721908


== 2021-06-21 ==
== 2021-09-21 ==
* 23:16 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I13646a5557c9}} (duration: 00m 55s)
* 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:12 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I302a71d19604a45c54c}} (duration: 00m 56s)
* 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Idcac4d0345059f84}} (duration: 00m 56s)
* 22:56 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:05 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I877a3ee1bbffb1e0}} (duration: 00m 57s)
* 22:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 23:04 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|Icc2676c23b808d}} (duration: 00m 56s)
* 21:58 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:57 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|Iea94283c53}} (duration: 00m 57s)
* 21:16 cstone: payments-wiki revision is {{Gerrit|23d0ffac66}}
* 22:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iea94283c53}} (duration: 00m 57s)
* 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:42 eileen: civicrm revision changed from {{Gerrit|0fca489063}} to {{Gerrit|629bd3b7b7}}, config revision is {{Gerrit|2aed6ff89b}}
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=viwiki --fix # [[phab:T284868|T284868]] # P16674
* 19:54 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable 'DuplicateParse' logging bucket (duration: 01m 07s)
* 22:13 eileen: civicrm revision changed from {{Gerrit|acbcce94a2}} to {{Gerrit|0fca489063}}, config revision is {{Gerrit|2aed6ff89b}}
* 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:11 sbassett: Deployed security patch for [[phab:T285190|T285190]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
* 19:10 ryankemper: [[phab:T280001|T280001]] `sre.dns.netbox` completed successfully
* 19:19 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
* 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.1
* 18:41 ppchelko@deploy1002: Synchronized wmf-config/wikitech.php: Replace uses of AbstractBlock::getTarget() [[phab:T284141|T284141]] (duration: 00m 58s)
* 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:30 urbanecm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: {{Gerrit|af61f1a9ffc4703ad9514eac96655c3dd9d491d8}}: Add pool counter for automated search requests ([[phab:T284479|T284479]]) (duration: 00m 59s)
* 18:57 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
* 18:30 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@40b4b2f]: [[phab:T273854|T273854]] Airflow dag to extract and process sparql queries (duration: 07m 11s)
* 18:56 ryankemper: [[phab:T280001|T280001]] Running `sudo -i cookbook sre.dns.netbox -t [[phab:T280001|T280001]] 'Added wcqs.svc.[eqiad,codfw].wmnet'` per final step of https://wikitech.wikimedia.org/wiki/LVS#DNS_changes_(svc_zone_only)...
* 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f7db2b94d62c01cd62ec88d25675d3349d9c1b98}}: Enable wikilove on hewikisource ([[phab:T284864|T284864]]) (duration: 00m 56s)
* 18:53 ryankemper: [[phab:T280001|T280001]] `for i in 0 1 2 ; do dig @ns$<nowiki>{</nowiki>i<nowiki>}</nowiki>.wikimedia.org -t any wcqs.svc.[eqiad,codfw].wmnet ; done` looks as expected
* 18:26 brennen: gitlab1001: running ansible for copying latest backup to dedicated folder ([[phab:T274463|T274463]])
* 18:48 ryankemper: [[phab:T280001|T280001]] `OK - authdns-update successful on all nodes!`
* 18:24 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikisource wikilove # [[phab:T284864|T284864]]
* 18:45 ryankemper: [[phab:T280001|T280001]] `ryankemper@authdns1001:~$ sudo authdns-update`
* 18:23 urbanecm: Correction: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource wikilove # [[phab:T284864|T284864]]
* 18:44 ryankemper: [[phab:T280001|T280001]] Merging https://gerrit.wikimedia.org/r/c/operations/dns/+/713929; will follow steps in https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile post-merge
* 18:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource # [[phab:T284864|T284864]]
* 17:56 cstone: payments-wiki revision is {{Gerrit|23d0ffac66}}
* 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@40b4b2f]: [[phab:T273854|T273854]] Airflow dag to extract and process sparql queries
* 17:49 dduvall: 1.38.0-wmf.1 deployed to testwikis ([[phab:T281165|T281165]])
* 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dd0fecb89da39611cdf18584626508741cf369e9}}: Rename Portal and Portal talk namespaces on viwiki ([[phab:T284868|T284868]]) (duration: 00m 56s)
* 17:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d8b9df1d6c77cae8fa18c343e5f1ec139d603b0}}: Disable Education Program namespaces in enwiki ([[phab:T285193|T285193]]) (duration: 00m 58s)
* 17:48 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.1 (duration: 35m 44s)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: {{Gerrit|5a51dd2540cd620cec60b9921b57c8ab185df029}}: Add `managechangetags` to the `abusefilter` group on eswiki ([[phab:T285167|T285167]]) (duration: 00m 56s)
* 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|219dd5b79d59b98d13e34f7d6e22a41ae1dd9cbb}}: eswiki AbuseFilter config changes ([[phab:T284797|T284797]]; 2/2) (duration: 00m 56s)
* 17:39 elukey: update pcc facts
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: {{Gerrit|219dd5b79d59b98d13e34f7d6e22a41ae1dd9cbb}}: eswiki AbuseFilter config changes ([[phab:T284797|T284797]]; 1/2) (duration: 01m 07s)
* 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:40 ebernhardson: post-deploy restart airflow-webserver and airflow-scheduler on an-airflow1001
* 17:35 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs (duration: 04m 24s)
* 17:27 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs
* 17:12 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.1
* 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:08 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:46 papaul: poweroff elastic2043 for maintenance
* 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:25 hashar: Updated operations-puppet-tests-buster-docker Jenkins job to use latest Docker image https://gerrit.wikimedia.org/r/c/integration/config/+/700648
* 16:51 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1009.eqiad.wmnet
* 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
* 16:33 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:14 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:57 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 15:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 15:46 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 15:39 elukey: update pcc facts
* 14:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 15:26 effie: upgrade php7.2 on app-canaries and restart service - [[phab:T291052|T291052]]
* 14:40 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 15:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
* 15:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:37 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
* 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from codfw [[phab:T167973|T167973]]', diff saved to https://phabricator.wikimedia.org/P17307 and previous config saved to /var/cache/conftool/dbconfig/20210921-150958-marostegui.json
* 14:37 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
* 14:35 XioNoX: re-enable AMS-IX peering sessions - [[phab:T291407|T291407]]
* 14:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
* 14:17 XioNoX: temporarily downpref Telia-Deutsch Telekom to not saturate Telia transit - [[phab:T291407|T291407]]
* 14:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
* 13:52 XioNoX: disable AMS-IX peering sessions for maintenance - [[phab:T291407|T291407]]
* 14:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
* 13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
* 13:48 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
* 13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
* 13:41 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
* 13:37 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:21 volans: deployed spicerack release v0.0.54 on the cumin hosts
* 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 14:19 XioNoX: reboot scs-c1-codfw - [[phab:T285229|T285229]]
* 13:18 effie: upgrading php on wtp* servers to  7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 && rolling service restart - [[phab:T291052|T291052]]
* 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:17 XioNoX: reboot scs-a1-codfw - [[phab:T285229|T285229]]
* 12:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2025.codfw.wmnet
* 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1008.eqiad.wmnet
* 11:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 14:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 11:55 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:14 klausman: starting update of ML team's etcd machines in eqiad
* 11:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 14:14 volans: uploaded spicerack_0.0.54 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:45 jgiannelos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Configure event stream for map tile state change - {{Gerrit|3b01ef587}} (duration: 00m 57s)
* 14:11 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 11:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 14:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1008.eqiad.wmnet
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 14:06 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:05 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 13:58 XioNoX: reboot scs-eqsin - [[phab:T285229|T285229]]
* 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:58 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 13:57 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 10:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1006.eqiad.wmnet
* 09:59 _joe_: rebuilding openjdk8* image, ruby, nodejs-slim for [[phab:T291458|T291458]]
* 13:56 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 09:46 _joe_: deneb:~# docker-registryctl delete-tags docker-registry.wikimedia.org/fluentd [[phab:T291458|T291458]]
* 13:55 jynus: stopping replication at db1171:s3 at db1123-bin.004363:{{Gerrit|906878073}}
* 09:44 _joe_: deleting images for graphoid, [[phab:T291458|T291458]]
* 13:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 05:16 kart_: Upgraded cxserver to 2021-09-16-130208-production
* 13:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 05:12 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1006.eqiad.wmnet
* 05:03 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 13:48 XioNoX: reboot scs-ulsfo
* 04:58 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 13:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:38 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:35 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MobileFrontend/includes/ExtMobileFrontend.php: Backport: [[gerrit:700344{{!}}Avoid loading the whole entity when it only needs description. (T269960)]] (duration: 00m 58s)
* 00:16 tgr: Evening deploys done
* 13:28 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 00:16 tgr@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:722449{{!}}AddLink: Skip over headings in phrase matching (T291361)]] (duration: 00m 57s)
* 13:24 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
* 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
* 13:19 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
* 13:17 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
* 13:14 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
* 13:12 elukey: upload istioctl 1.9.5 to <nowiki>{</nowiki>buster,stretch<nowiki>}</nowiki>-wikimedia
* 13:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Merged broken patch
* 13:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 40 hosts with reason: Merged broken patch
* 13:09 klausman: starting update of ML team's etcd machines in codfw
* 12:55 godog: move librenms alerts with "max alerts" == -1 to "interval" being 15m - [[phab:T285205|T285205]]
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16672 and previous config saved to /var/cache/conftool/dbconfig/20210621-124030-root.json
* 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16671 and previous config saved to /var/cache/conftool/dbconfig/20210621-123906-root.json
* 12:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase: Backport: [[gerrit:700528{{!}}Rewrite SerializationModifier to be more efficient]] (duration: 01m 02s)
* 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1010.eqiad.wmnet
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16670 and previous config saved to /var/cache/conftool/dbconfig/20210621-122526-root.json
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16669 and previous config saved to /var/cache/conftool/dbconfig/20210621-122403-root.json
* 12:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1010.eqiad.wmnet
* 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2008.codfw.wmnet
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16668 and previous config saved to /var/cache/conftool/dbconfig/20210621-121023-root.json
* 12:10 godog: bump space for k8s and ops prometheus on prometheus1004 (prometheus1003 has been expanded previously but not logged)
* 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16667 and previous config saved to /var/cache/conftool/dbconfig/20210621-120859-root.json
* 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2008.codfw.wmnet
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16665 and previous config saved to /var/cache/conftool/dbconfig/20210621-115519-root.json
* 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T283499|T283499]]', diff saved to https://phabricator.wikimedia.org/P16664 and previous config saved to /var/cache/conftool/dbconfig/20210621-115441-marostegui.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16663 and previous config saved to /var/cache/conftool/dbconfig/20210621-115355-root.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 [[phab:T283499|T283499]]', diff saved to https://phabricator.wikimedia.org/P16662 and previous config saved to /var/cache/conftool/dbconfig/20210621-115143-marostegui.json
* 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bf35e0d3934a2c44e95fc4d7bc30a2a6c74c59a}}: Disable indexing user (sub)pages and draft-related pages on hrwiki ([[phab:T284384|T284384]]) (duration: 00m 56s)
* 11:21 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|1b97376dea470122f9da9ea8dddde7e8fe94231f}}: Change vi.wikisource logo to the same logo being used at en.wikisource ([[phab:T284612|T284612]]) (duration: 00m 56s)
* 11:20 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|1b97376dea470122f9da9ea8dddde7e8fe94231f}}: Change vi.wikisource logo to the same logo being used at en.wikisource ([[phab:T284612|T284612]]) (duration: 00m 57s)
* 11:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|464cc0bb0d9ff6c47e2a90cba5232e9506ff5dec}}: ptwikinews: Remove NS ID 102,103 ([[phab:T285163|T285163]]) (duration: 00m 56s)
* 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:700160{{!}}Add WMCS public addresses to $wgSoftBlockRanges]] (duration: 00m 56s)
* 11:04 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4  (duration: 02m 53s)
* 11:01 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
* 10:55 moritzm: restarting FPM on mw canaries to pick up nettle security updates
* 10:45 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
* 10:45 moritzm: installing nettle security updates on buster
* 10:44 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
* 10:44 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
* 10:43 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
* 10:41 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 50s)
* 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:692281{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 10:40 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
* 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:692281{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 10:37 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4  (duration: 02m 22s)
* 10:36 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
* 10:36 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 56s)
* 10:29 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next
* 10:27 jbond@deploy1002: Finished deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next (duration: 03m 12s)
* 10:24 jbond@deploy1002: Started deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next
* 10:22 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 22s)
* 10:20 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
* 10:19 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 13s)
* 10:17 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
* 10:16 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 03s)
* 10:15 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
* 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 30s)
* 10:13 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
* 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 (duration: 03m 10s)
* 10:10 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4
* 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/FlaggedRevs: Backport: [[gerrit:700340{{!}}Drop LocalFile::getHistory hook handler (T284777 T277883)]] (duration: 00m 58s)
* 09:52 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:698226{{!}}Enable wikisource group as langlink group of sourcewiki (T275958)]] (duration: 00m 56s)
* 09:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:700491{{!}}Set $wmgWikibaseTmpSerializeEmptyListsAsObjects to true everywhere (T241422)]] (duration: 00m 57s)
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16659 and previous config saved to /var/cache/conftool/dbconfig/20210621-094049-root.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 [[phab:T284529|T284529]]', diff saved to https://phabricator.wikimedia.org/P16658 and previous config saved to /var/cache/conftool/dbconfig/20210621-092623-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16657 and previous config saved to /var/cache/conftool/dbconfig/20210621-092545-root.json
* 09:19 ladsgroup@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 04m 49s)
* 09:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16656 and previous config saved to /var/cache/conftool/dbconfig/20210621-091041-root.json
* 09:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 marostegui: Deploy [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]] on db1123
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16655 and previous config saved to /var/cache/conftool/dbconfig/20210621-085538-root.json
* 08:31 dcausse: depooling wdqs1005 (lag)
* 07:47 moritzm: updated buster d-i image for Buster 10.10 point release (which included ABI bump for Linux kernel)
* 07:44 jayme: started debian-weekly-rebuild.service on deneb (it failed due to 404 on snapshots.debian.org yesterday)
* 06:49 moritzm: installing libwebp security updates on buster
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16654 and previous config saved to /var/cache/conftool/dbconfig/20210621-062156-root.json
* 06:20 marostegui: Re-add rev_page_id to db1135 [[phab:T163532|T163532]] [[phab:T285149|T285149]]
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 [[phab:T163532|T163532]]', diff saved to https://phabricator.wikimedia.org/P16653 and previous config saved to /var/cache/conftool/dbconfig/20210621-062014-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16652 and previous config saved to /var/cache/conftool/dbconfig/20210621-060652-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16651 and previous config saved to /var/cache/conftool/dbconfig/20210621-055149-root.json
* 05:50 kart_: cxserver: Added support for Elia MT  + Updated to 2021-06-10-074331-production ([[phab:T276059|T276059]], [[phab:T275803|T275803]], [[phab:T276246|T276246]], [[phab:T283513|T283513]], [[phab:T255231|T255231]], [[phab:T237028|T237028]])
* 05:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16650 and previous config saved to /var/cache/conftool/dbconfig/20210621-053645-root.json
* 05:33 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:31 kormat: stopping replication on db1123 [[phab:T283131|T283131]]
* 05:25 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 05:11 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1123 until it's reimaged to buster [[phab:T284648|T284648]]', diff saved to https://phabricator.wikimedia.org/P16649 and previous config saved to /var/cache/conftool/dbconfig/20210621-051149-kormat.json
* 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 master and set section read-write [[phab:T284648|T284648]]', diff saved to https://phabricator.wikimedia.org/P16648 and previous config saved to /var/cache/conftool/dbconfig/20210621-050506-kormat.json
* 05:03 kormat@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - [[phab:T284648|T284648]]', diff saved to https://phabricator.wikimedia.org/P16647 and previous config saved to /var/cache/conftool/dbconfig/20210621-050304-kormat.json
* 05:02 kormat: Starting s3 eqiad failover from db1123 to db1157 - [[phab:T284648|T284648]]
* 04:49 kormat@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 [[phab:T284648|T284648]]', diff saved to https://phabricator.wikimedia.org/P16646 and previous config saved to /var/cache/conftool/dbconfig/20210621-044955-kormat.json
* 04:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 21 hosts with reason: Master switchover s3 [[phab:T284648|T284648]]
* 04:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 21 hosts with reason: Master switchover s3 [[phab:T284648|T284648]]
* 04:40 marostegui: Re-add rev_page_id to db1099:3311 [[phab:T163532|T163532]] [[phab:T285149|T285149]]
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T163532|T163532]]', diff saved to https://phabricator.wikimedia.org/P16645 and previous config saved to /var/cache/conftool/dbconfig/20210621-043941-marostegui.json


== 2021-06-18 ==
== 2021-09-20 ==
* 20:55 Krinkle: Remove doc1001:/srv/doc/mediawiki-core/wmf-1.36.0-wmf.31-testing
* 23:31 ejegg: updated fundraising CiviCRM from {{Gerrit|e6bf81d99c}} to {{Gerrit|14658445a2}}
* 13:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16640 and previous config saved to /var/cache/conftool/dbconfig/20210618-125306-root.json
* 23:22 mutante: LDAP - added georginaburnett-wmde to NDA group ([[phab:T291391|T291391]], [[phab:T273780|T273780]])
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16639 and previous config saved to /var/cache/conftool/dbconfig/20210618-123802-root.json
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16638 and previous config saved to /var/cache/conftool/dbconfig/20210618-122526-root.json
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16637 and previous config saved to /var/cache/conftool/dbconfig/20210618-122259-root.json
* 22:21 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16636 and previous config saved to /var/cache/conftool/dbconfig/20210618-121022-root.json
* 22:14 mutante: wdqs1004 - depool
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16635 and previous config saved to /var/cache/conftool/dbconfig/20210618-120755-root.json
* 22:10 mutante: wdqs1004 - service wdqs-updater restart
* 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16634 and previous config saved to /var/cache/conftool/dbconfig/20210618-115518-root.json
* 22:06 mutante: wdqs1004 - HTTP/1.1 503 Service Unavailable - systemctl restart wdqs-blazegraph
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16633 and previous config saved to /var/cache/conftool/dbconfig/20210618-114015-root.json
* 22:05 foks: changing user email for MIskander (WMF)@collabwiki
* 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16631 and previous config saved to /var/cache/conftool/dbconfig/20210618-112739-marostegui.json
* 21:41 mutante: ms-fe1005 - systemctl start swift_dispersion_stats.service (gerrit:719285)
* 09:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:49 XioNoX: eqsin-codfw link re-enabled but drained
* 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:39 legoktm: finished adding shellbox LVS entry, https://shellbox.svc.eqiad.wmnet:4008/ and https://shellbox.svc.codfw.wmnet:4008/ now work ([[phab:T281423|T281423]])
* 19:45 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert "Disable jQuery Migrate on group1" ([[phab:T291410|T291410]]) (duration: 00m 56s)
* 08:30 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
* 17:02 legoktm: repooled codfw (traffic/caches) 1 week after DC switchover
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16630 and previous config saved to /var/cache/conftool/dbconfig/20210618-081737-root.json
* 16:41 effie: upgrading php on wtp[1025-1029] to  7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 - [[phab:T291052|T291052]]
* 08:06 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox
* 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16629 and previous config saved to /var/cache/conftool/dbconfig/20210618-080233-root.json
* 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16628 and previous config saved to /var/cache/conftool/dbconfig/20210618-074729-root.json
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17305 and previous config saved to /var/cache/conftool/dbconfig/20210920-144844-root.json
* 07:44 legoktm: restarting pybal on lvs1015, lvs2009 (active) - [[phab:T281423|T281423]]
* 14:42 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:35 legoktm: restarting pyball on lvs1016, lvs2010 to add shellbox
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17304 and previous config saved to /var/cache/conftool/dbconfig/20210920-143340-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16627 and previous config saved to /var/cache/conftool/dbconfig/20210618-073225-root.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17303 and previous config saved to /var/cache/conftool/dbconfig/20210920-141836-root.json
* 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2010.codfw.wmnet
* 14:11 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2010.codfw.wmnet
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: After migrating wikitech to codfw', diff saved to https://phabricator.wikimedia.org/P17302 and previous config saved to /var/cache/conftool/dbconfig/20210920-140333-root.json
* 06:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
* 13:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16626 and previous config saved to /var/cache/conftool/dbconfig/20210618-063632-root.json
* 13:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16625 and previous config saved to /var/cache/conftool/dbconfig/20210618-062452-marostegui.json
* 13:45 moritzm: restarting apache on Logstash ELK5 cluster to pick up GNUTLS update [[phab:T283165|T283165]]
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16624 and previous config saved to /var/cache/conftool/dbconfig/20210618-062129-root.json
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16623 and previous config saved to /var/cache/conftool/dbconfig/20210618-060625-root.json
* 13:20 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons. - elukey@cumin1001
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16622 and previous config saved to /var/cache/conftool/dbconfig/20210618-060452-root.json
* 13:13 damilare: updated payments-wiki from {{Gerrit|f9cbf95a12}} to {{Gerrit|23d0ffac66}}
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16621 and previous config saved to /var/cache/conftool/dbconfig/20210618-055122-root.json
* 12:59 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16620 and previous config saved to /var/cache/conftool/dbconfig/20210618-054949-root.json
* 12:58 marostegui: Drop ct_tag_id_log key from db1144:3314 [[phab:T277416|T277416]]
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16619 and previous config saved to /var/cache/conftool/dbconfig/20210618-054841-marostegui.json
* 12:54 moritzm: installing gnutls28 updates for stretch with backport for forthcoming Let's encrypt issuance chain update ([[phab:T283165|T283165]])
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16618 and previous config saved to /var/cache/conftool/dbconfig/20210618-054659-root.json
* 12:42 marostegui: Add ct_tag_id_log key to db1144:3314 [[phab:T277416|T277416]]
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16617 and previous config saved to /var/cache/conftool/dbconfig/20210618-053445-root.json
* 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16616 and previous config saved to /var/cache/conftool/dbconfig/20210618-053156-root.json
* 11:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16615 and previous config saved to /var/cache/conftool/dbconfig/20210618-051942-root.json
* 11:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16614 and previous config saved to /var/cache/conftool/dbconfig/20210618-051712-marostegui.json
* 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16613 and previous config saved to /var/cache/conftool/dbconfig/20210618-051652-root.json
* 11:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16612 and previous config saved to /var/cache/conftool/dbconfig/20210618-050148-root.json
* 11:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16611 and previous config saved to /var/cache/conftool/dbconfig/20210618-045808-marostegui.json
* 11:31 urbanecm@deploy1002: Finished scap: {{Gerrit|b9031bc}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]) (duration: 11m 44s)
* 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16610 and previous config saved to /var/cache/conftool/dbconfig/20210618-045743-marostegui.json
* 11:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16609 and previous config saved to /var/cache/conftool/dbconfig/20210618-045355-marostegui.json
* 11:20 urbanecm@deploy1002: Started scap: {{Gerrit|b9031bc}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]])
* 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:722348{{!}}Disable jQuery Migrate on group1 (T280944)]] (duration: 00m 56s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b518d8ba03e85afdf98f2e06bf569b4f2b551b1b}}: Mentor dashboard: Enable beta mode at testwiki ([[phab:T281534|T281534]]) (duration: 00m 55s)
* 11:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/: {{Gerrit|b9031bc572f6e3f4e12e6102c2816467af3580f4}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]; 5) (duration: 00m 56s)
* 11:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/ServiceWiring.php: {{Gerrit|b9031bc572f6e3f4e12e6102c2816467af3580f4}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]; 4) (duration: 00m 56s)
* 11:09 hnowlan: roll restarting restbase service in codfw
* 11:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/Modules/MentorTools.php: {{Gerrit|b9031bc572f6e3f4e12e6102c2816467af3580f4}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]; 2) (duration: 00m 55s)
* 11:07 urbanecm@deploy1002: sync-file aborted: {{Gerrit|b9031bc572f6e3f4e12e6102c2816467af3580f4}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]; 1) (duration: 00m 00s)
* 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MentorTools/MentorStatusManager.php: {{Gerrit|b9031bc572f6e3f4e12e6102c2816467af3580f4}}: Mentor dashboard: Mentor tools ([[phab:T280307|T280307]]; 1) (duration: 00m 57s)
* 11:05 hnowlan: roll restarting restbase service in eqiad for openssl updates
* 10:45 hnowlan: roll restarting kartotherian and tilerator on maps2*
* 10:41 hnowlan: roll restarting kartotherian and tilerator on maps1*
* 10:36 jynus: rolling restart bacula & minio daemons on backup hosts
* 09:59 moritzm: restarting apache2 on thorium
* 09:48 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove s10 from eqiad [[phab:T167973|T167973]]', diff saved to https://phabricator.wikimedia.org/P17300 and previous config saved to /var/cache/conftool/dbconfig/20210920-094739-marostegui.json
* 09:10 moritzm: installing openssl1.0 updates for stretch with backport for forthcoming Let's encrypt issuance chain update ([[phab:T283165|T283165]])
* 08:35 moritzm: updating clamav on ticket.wikimedia.org/otrs1001 to 0.103.3
* 08:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:49 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main [[phab:T290982|T290982]]
* 07:48 moritzm: uploaded maps-deduped-tilelist 0.0.3~deb10u1 to buster-wikimedia/main
* 07:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:43 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:35 marostegui: Stop db1168 and db2129 in sync [[phab:T167973|T167973]]
* 07:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:34 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|af9d6e4e29e5f53ad8cf5aa2c235d54500c433bd}}: Revert "Add throttle rule for Czech wiki course" (duration: 00m 56s)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 [[phab:T167973|T167973]]', diff saved to https://phabricator.wikimedia.org/P17299 and previous config saved to /var/cache/conftool/dbconfig/20210920-073256-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 [[phab:T167973|T167973]]', diff saved to https://phabricator.wikimedia.org/P17298 and previous config saved to /var/cache/conftool/dbconfig/20210920-073206-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 [[phab:T167973|T167973]]', diff saved to https://phabricator.wikimedia.org/P17297 and previous config saved to /var/cache/conftool/dbconfig/20210920-073141-marostegui.json
* 07:31 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf2 to apt.wikimedia.org (component/php7.2 for buster-wikimedia) [[phab:T291052|T291052]]
* 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8c1d665b5e83f6b1dd1cc4a9c367cb6881473bba}}: enwiki: Bump Growth features to 25% (mentorship limited to 20% of those users) ([[phab:T290927|T290927]]) (duration: 00m 57s)
* 07:20 urbanecm: Revert undeployed config patch (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/721959); not even pulled to deployment, so assuming it never hit prod ([[phab:T289771|T289771]])
* 06:00 marostegui: Upgrade db2071, db2072, db2094


== 2021-06-17 ==
== 2021-09-18 ==
* 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids ([[phab:T280731|T280731]])
* 01:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 00m 57s)
* 18:24 ryankemper: [[phab:T285106|T285106]] [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
* 01:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.23/includes/libs/rdbms/database/Database.php: (no justification provided) (duration: 01m 03s)
* 18:01 dancy: Deployed latest scap code to beta cluster
* 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: [[gerrit:700036{{!}}client: Bring back using the client setting for langlink group (T284854)]] (duration: 00m 58s)
* 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
* 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
* 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previous config saved to /var/cache/conftool/dbconfig/20210617-093438-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16581 and previous config saved to /var/cache/conftool/dbconfig/20210617-092056-root.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16580 and previous config saved to /var/cache/conftool/dbconfig/20210617-091934-root.json
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P16579 and previous config saved to /var/cache/conftool/dbconfig/20210617-090947-marostegui.json
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16578 and previous config saved to /var/cache/conftool/dbconfig/20210617-090552-root.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16577 and previous config saved to /var/cache/conftool/dbconfig/20210617-085048-root.json
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16576 and previous config saved to /var/cache/conftool/dbconfig/20210617-084941-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16575 and previous config saved to /var/cache/conftool/dbconfig/20210617-083545-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16574 and previous config saved to /var/cache/conftool/dbconfig/20210617-083438-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16573 and previous config saved to /var/cache/conftool/dbconfig/20210617-083005-marostegui.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16572 and previous config saved to /var/cache/conftool/dbconfig/20210617-082939-marostegui.json
* 08:28 elukey: upload istioctl 1.6.14-1 to buster-wikimedia
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16571 and previous config saved to /var/cache/conftool/dbconfig/20210617-082437-root.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16570 and previous config saved to /var/cache/conftool/dbconfig/20210617-082409-marostegui.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16569 and previous config saved to /var/cache/conftool/dbconfig/20210617-081934-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16568 and previous config saved to /var/cache/conftool/dbconfig/20210617-080933-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16567 and previous config saved to /var/cache/conftool/dbconfig/20210617-080430-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16566 and previous config saved to /var/cache/conftool/dbconfig/20210617-075825-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16565 and previous config saved to /var/cache/conftool/dbconfig/20210617-075429-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16564 and previous config saved to /var/cache/conftool/dbconfig/20210617-073926-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16563 and previous config saved to /var/cache/conftool/dbconfig/20210617-073305-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16562 and previous config saved to /var/cache/conftool/dbconfig/20210617-073229-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16561 and previous config saved to /var/cache/conftool/dbconfig/20210617-071726-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16560 and previous config saved to /var/cache/conftool/dbconfig/20210617-070222-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16559 and previous config saved to /var/cache/conftool/dbconfig/20210617-064717-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16558 and previous config saved to /var/cache/conftool/dbconfig/20210617-063135-marostegui.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16557 and previous config saved to /var/cache/conftool/dbconfig/20210617-062514-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16556 and previous config saved to /var/cache/conftool/dbconfig/20210617-061010-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16555 and previous config saved to /var/cache/conftool/dbconfig/20210617-055507-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16554 and previous config saved to /var/cache/conftool/dbconfig/20210617-054003-root.json
* 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16553 and previous config saved to /var/cache/conftool/dbconfig/20210617-053455-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16552 and previous config saved to /var/cache/conftool/dbconfig/20210617-053105-root.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16551 and previous config saved to /var/cache/conftool/dbconfig/20210617-051601-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16550 and previous config saved to /var/cache/conftool/dbconfig/20210617-050057-root.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16549 and previous config saved to /var/cache/conftool/dbconfig/20210617-044554-root.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16548 and previous config saved to /var/cache/conftool/dbconfig/20210617-044146-marostegui.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16547 and previous config saved to /var/cache/conftool/dbconfig/20210617-044132-marostegui.json
* 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16546 and previous config saved to /var/cache/conftool/dbconfig/20210617-043130-marostegui.json


== 2021-06-16 ==
== 2021-09-17 ==
* 21:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:19 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 17:41 dancy: Reverted Scap release on beta
* 19:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 16:18 topranks: Resetting metric on Telia CCT IC-331929, cr1-codfw and cr3-eqsin.
* 17:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 15:22 dancy: testing upcoming Scap release on beta
* 17:02 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16545 and previous config saved to /var/cache/conftool/dbconfig/20210616-125329-root.json
* 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: REIMAGE
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16544 and previous config saved to /var/cache/conftool/dbconfig/20210616-123826-root.json
* 16:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 12:34 kormat: deploying heartbeat service puppet change
* 16:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16543 and previous config saved to /var/cache/conftool/dbconfig/20210616-122322-root.json
* 16:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16541 and previous config saved to /var/cache/conftool/dbconfig/20210616-120818-root.json
* 16:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 16:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 14:49 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16540 and previous config saved to /var/cache/conftool/dbconfig/20210616-120015-marostegui.json
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16539 and previous config saved to /var/cache/conftool/dbconfig/20210616-112115-root.json
* 13:06 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 11:20 hnowlan: running `nodetool cleanup` on maps1005
* 11:28 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16538 and previous config saved to /var/cache/conftool/dbconfig/20210616-110612-root.json
* 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16537 and previous config saved to /var/cache/conftool/dbconfig/20210616-105108-root.json
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 09:37 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency (duration: 00m 07s)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16536 and previous config saved to /var/cache/conftool/dbconfig/20210616-103604-root.json
* 09:37 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a] (thin): Only syncing sanitize allowlist, deploying THIN for consistency
* 10:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
* 09:36 milimetric@deploy1002: Finished deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist (duration: 17m 43s)
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16535 and previous config saved to /var/cache/conftool/dbconfig/20210616-102349-marostegui.json
* 09:19 milimetric@deploy1002: Started deploy [analytics/refinery@37e904a]: Only syncing sanitize allowlist
* 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 08:00 jayme: restarting php-fpm on wtp1037 and wtp1030
* 09:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 02:28 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Successfully rolled out to rest of fleet `sudo cumin 'C:query_service::crontasks' 'sudo run-puppet-agent --force && sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer'`
* 09:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
* 02:22 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `wdqs2001` and `wdqs2004` look fine after running `sudo systemctl reset-failed wdqs-restart-hourly-w-random-delay.timer` to clean up dangling timer
* 09:50 hnowlan: disabling puppet on maps1* to reparent maps1007 from new master maps1009
* 01:55 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] Testing on arbitrary codfw host: `ryankemper@wdqs2001:~$ sudo run-puppet-agent`
* 09:47 kormat: truncating all pc* tables on pc1010 [[phab:T282761|T282761]]
* 01:48 ryankemper: [[phab:T290330|T290330]] [Remove WDQS codfw ~hourly restarts] `sudo cumin 'C:query_service::crontasks' 'sudo disable-puppet "Stop doing wdqs codfw ~hourly restarts - [[phab:T290330|T290330]]"'`
* 09:40 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary [[phab:T282761|T282761]] (duration: 00m 59s)
* 00:04 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:04 kormat: Deploying wmfmariadbpy 0.7.1 [[phab:T284819|T284819]]
* 00:01 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 09:04 kormat: uploaded wmfmariadbpy 0.7.1 to apt.wm.o
* 08:24 Amir1: running "update flaggedrevs set fr_quality = 0 where fr_quality != 0;" on all wikis where flagged revs is enabled ([[phab:T279761|T279761]])
* 07:27 dcausse: cleanup old /var/log/airflow/scheduler logs to reclaim space on an-airflow1001
* 06:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:52 volans@cumin1001: START - Cookbook sre.dns.netbox
* 05:06 marostegui: Upgrade clouddb1014


== 2021-06-15 ==
== 2021-09-16 ==
* 17:54 dancy: testing upcoming Scap release on beta
* 23:58 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 17:21 mutante: new Wikimedia language "shi" added - Shilha /ˈʃɪlhə/ is a Berber language native to Shilha people. The endonym is Taclḥit /taʃlʜijt/, and in recent English publications the language is often rendered Tashelhiyt or Tashelhit.
* 23:51 ryankemper: [[phab:T273673|T273673]] All looks good, re-enabling puppet and running on rest of fleet: `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo run-puppet-agent --force'`
* 17:17 mutante: new Wikimedia language "dag" added - Dagbani (or Dagbane), also known as Dagbanli and Dagbanle, is a Gur language spoken in Ghana.
* 23:44 ryankemper: [[phab:T273673|T273673]] The associated crons are gone and I see the new systemd timers for both gc-cleanup and the hot threads logger
* 17:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 23:39 ryankemper: [[phab:T273673|T273673]] Testing elasticsearch cron->systemd timer-job changes on canary instance `ryankemper@elastic1064:~$ sudo run-puppet-agent --force`
* 17:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
* 23:37 ryankemper: [[phab:T273673|T273673]] Disabling puppet on elasticsearch hosts `sudo cumin 'R:Class = elasticsearch::log::hot_threads' 'sudo disable-puppet "https://gerrit.wikimedia.org/r/c/operations/puppet/+/721413 - [[phab:T273673|T273673]]"'`
* 16:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 23:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:11 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
* 23:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:19 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 14:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:18 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:25 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 23:18 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 13:23 marostegui: Upgrade clouddb1018
* 23:17 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 13:15 effie: enable puppet on canaries
* 23:17 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:10 effie: disable puppet on canaries to deploy 699908
* 23:16 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 10:45 XioNoX: re-enable cr1-codfw:xe-5/1/2
* 22:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
* 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json
* 22:38 legoktm@deploy1002: Finished scap: i18n for restoring deprecated token APIs (duration: 15m 30s)
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json
* 22:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json
* 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json
* 22:23 legoktm@deploy1002: Started scap: i18n for restoring deprecated token APIs
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json
* 22:21 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/: Restore deprecated token APIs (3/3) (duration: 00m 56s)
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json
* 22:19 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/autoload.php: Restore deprecated token APIs (2/3) (duration: 00m 56s)
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json
* 22:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.23/includes/api/ApiTokens.php: Restore deprecated token APIs (1/3) (duration: 00m 56s)
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json
* 21:22 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json
* 21:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4004.ulsfo.wmnet with reason: REIMAGE
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:10 XioNoX: roll OSPF link-protection to all routers - [[phab:T167306|T167306]]
* 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:30 eileen: civicrm revision changed from {{Gerrit|d9d61dad0b}} to {{Gerrit|acbcce94a2}}, config revision is {{Gerrit|2aed6ff89b}}
* 20:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721610{{!}}Set jQuery migrate to false for wikibooks and Commons (T280944)]] (duration: 00m 56s)
* 01:22 eileen: civicrm revision changed from {{Gerrit|28ace1b86f}} to {{Gerrit|d9d61dad0b}}, config revision is {{Gerrit|2aed6ff89b}}
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:37 eileen: civicrm revision changed from {{Gerrit|31d07115a0}} to {{Gerrit|28ace1b86f}}, config revision is {{Gerrit|2aed6ff89b}}
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.23
* 18:55 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:46 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 18:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/addlink/AddLinkArticleTarget.js: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (2/2) (duration: 01m 06s)
* 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/extension.json: {{Gerrit|bb8cba102fe417e8e41b7c4e9179d119c7d25a43}}: Use growthexperiments-structuredtask-no-suggestions-found-dialog-button in outdated suggestions dialog (1/2) (duration: 01m 07s)
* 17:54 volans: turn of lldp agent on NIC (both ports) on ms-be105[1-9],ms-be205[2-6] - [[phab:T290984|T290984]]
* 17:31 volans: turn of lldp agent on NIC (both ports) on ms-be2051 - [[phab:T290984|T290984]]
* 17:09 jynus: deployed extra grants for admin user on s6 primary
* 16:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
* 16:17 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position) [[phab:T167973|T167973]]
* 16:04 marostegui: Disconnect s6 master from m5 master (noting the replication position)
* 15:52 bd808: marostegui is awesome and made wikitech better today. :)
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Set wikitech on read-only for maintenance [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P17283 and previous config saved to /var/cache/conftool/dbconfig/20210916-150444-marostegui.json
* 15:03 marostegui: Set wikitech on read-only (from now on all SAL changes will fail) [[phab:T167973|T167973]]
* 14:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:53 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: REIMAGE
* 14:35 mutante: reimaging mwmaint2002 to buster ([[phab:T267607|T267607]], [[phab:T245757|T245757]])
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint2002.codfw.wmnet with reason: reimage
* 14:12 mutante: switching https://noc.wikimedia.org from codfw to eqiad ([[phab:T287539|T287539]], [[phab:T267607|T267607]])
* 13:44 sukhe: homer: running for Gerrit: 721018: set up BGP peering to durum hosts in <nowiki>{</nowiki>eqiad,codfw,esams,ulsfo,eqsin<nowiki>}</nowiki>
* 13:25 effie: pool mw1422 mw1455
* 13:24 effie: poiol mw1422 mw1455
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:12 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 01m 04s)
* 13:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 12:08 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T290057|T290057]]
* 12:00 mbsantos: start OSM re-import script in maps2009 (depooled)
* 11:51 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|529f86c5a998820c32e7d7f2d952317080383e05}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 11:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/GrowthExperiments/includes/MentorDashboard/MenteeOverview/UncachedMenteeOverviewDataProvider.php: {{Gerrit|9e0f6f84240bf621e97806a94a0e786817001668}}: UncachedMenteeOverviewDataProvider: Do not fatal with zero mentees ([[phab:T291088|T291088]]) (duration: 01m 04s)
* 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: Fixing incorrect deployment of {{Gerrit|01e4450}} for [[phab:T291123|T291123]]. This is supposed to be a no-op. (duration: 01m 05s)
* 11:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23 (wmf/1.37.0-wmf.23 * u+2-2)]$ git rebase &&  git submodule update extensions/AbuseFilter/ # fixing an incorrect deployment that happened in [[phab:T291123|T291123]]
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging/php-1.37.0-wmf.23/extensions/AbuseFilter (wmf/1.37.0-wmf.23 u=)]$ git co {{Gerrit|0d2bc7ca17b9f767ae5753db7e4e41fd9e7d3531}} # reset repo to expected state, fixing incorrect deploy of a backport in [[phab:T291123|T291123]]
* 11:34 moritzm: installing 4.9.272 kernels on stretch hosts (no reboots yet)
* 11:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:21 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (2/2) (duration: 01m 05s)
* 11:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:721305{{!}}Add new WikimediaBadges config (T232927)]] (1/2) (duration: 01m 05s)
* 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 11:03 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:59 hashar@deploy1002: Synchronized php-1.37.0-wmf.21/includes/language/Message.php: Message: Remove deprecated format property - [[phab:T146416|T146416]] [[phab:T291124|T291124]] (duration: 01m 06s)
* 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:21 topranks: Changing default gateway on mw1422 to use VRRP backup (cr2), to determine if tail drops from switches to cr1 is cause of TCP retransmissions.
* 10:14 effie: depool mw1455 for network testing
* 10:11 effie: depool mw1422 for network testing
* 10:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:01 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:00 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 10:00 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for mx2002.wikimedia.org: Renew puppet certificate - jmm@cumin2002
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2002.wikimedia.org with reason: reimage
* 09:10 moritzm: in-place re-installation of mx2002.wikimedia.org (test VM) to test the new installer key support in the sre.puppet.renew-cert cookbook
* 08:04 moritzm: upgrading scandium to PHP 7.2 backport of patch for enhanced DOM replaceChild/removeChild performance  [[phab:T291052|T291052]]
* 07:48 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
* 05:35 marostegui: Optimize dewiki.logging in codfw [[phab:T287344|T287344]]


== 2021-06-14 ==
== 2021-09-15 ==
* 21:40 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics (duration: 00m 49s)
* 23:02 legoktm: upgrading lists1001 to use postorius 1.3.5
* 21:40 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@baeee47]: [[phab:T261407|T261407]] bulk_daemon: Deploy prioritized topics
* 22:51 legoktm: uploaded new mailmanclient/postorius packages to apt1001
* 19:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1003.eqiad.wmnet
* 22:38 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 19:21 twentyafterfour_: applying hotfix for [[phab:T284397|T284397]] and restarting php7.3-fpm on phab1001
* 22:03 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 18:30 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1003.eqiad.wmnet
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 17:05 jforrester@deploy1002: Finished deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 22:03 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 17:05 jforrester@deploy1002: Started deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 22:02 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@902529b]: 0.3.85 (duration: 06m 59s)
* 16:46 jforrester@deploy1002: Finished deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]] (duration: 00m 07s)
* 21:56 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.85` on canary `wdqs1003`; proceeding to rest of fleet
* 16:46 jforrester@deploy1002: Started deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for [[phab:T236915|T236915]]
* 21:55 ryankemper@deploy1002: Started deploy [wdqs/wdqs@902529b]: 0.3.85
* 15:56 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1002.eqiad.wmnet
* 21:55 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.85`. Pre-deploy tests passing on canary `wdqs1003`
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16521 and previous config saved to /var/cache/conftool/dbconfig/20210614-155258-root.json
* 21:42 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs (duration: 02m 07s)
* 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16520 and previous config saved to /var/cache/conftool/dbconfig/20210614-153754-root.json
* 21:40 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@f3473d9]: Reference files deployed by puppet through query_service paths instead of wdqs
* 15:24 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16519 and previous config saved to /var/cache/conftool/dbconfig/20210614-152250-root.json
* 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1005.eqiad.wmnet
* 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16518 and previous config saved to /var/cache/conftool/dbconfig/20210614-150747-root.json
* 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1005.eqiad.wmnet
* 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60e7e515d7034a9f839d78851f1dcc2be3df7f3b}}: Set wmgEchoEnablePush to false explicitly on arbcom_* wikis ([[phab:T291128|T291128]]) (duration: 01m 06s)
* 15:04 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1002.eqiad.wmnet
* 19:50 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.23/extensions/AbuseFilter/: sync backport for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/721312 (duration: 01m 06s)
* 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1004.eqiad.wmnet
* 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1004.eqiad.wmnet
* 19:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16517 and previous config saved to /var/cache/conftool/dbconfig/20210614-145243-root.json
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1003.eqiad.wmnet
* 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16516 and previous config saved to /var/cache/conftool/dbconfig/20210614-145039-root.json
* 19:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: Rollback all wikis to 1.37.0-wmf.23
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1003.eqiad.wmnet
* 19:07 urbanecm: Re-start server-side upload for 1 video file, likely temporary swift failure ([[phab:T289781|T289781]])
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16515 and previous config saved to /var/cache/conftool/dbconfig/20210614-144130-marostegui.json
* 19:06 urbanecm: Start server-side upload for 1 video file ([[phab:T287686|T287686]])
* 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1002.eqiad.wmnet
* 19:04 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.23 (duration: 00m 55s)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16514 and previous config saved to /var/cache/conftool/dbconfig/20210614-143536-root.json
* 19:03 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.23
* 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1002.eqiad.wmnet
* 18:52 urbanecm: Start server-side upload for 1 video file ([[phab:T289949|T289949]])
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16513 and previous config saved to /var/cache/conftool/dbconfig/20210614-143224-root.json
* 18:50 urbanecm: Start server-side upload for 1 video file ([[phab:T289781|T289781]])
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16512 and previous config saved to /var/cache/conftool/dbconfig/20210614-143211-root.json
* 18:44 urbanecm: Start server-side upload for 3 large PDF files ([[phab:T290722|T290722]])
* 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1001.eqiad.wmnet
* 18:43 legoktm: migrated sitereq-l@ from Google Groups to Mailman ([[phab:T290908|T290908]])
* 14:27 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on all wikis - [[phab:T271168|T271168]] (duration: 00m 57s)
* 18:27 urbanecm: Start server-side upload for 1 video file ([[phab:T290290|T290290]])
* 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1001.eqiad.wmnet
* 18:23 urbanecm: Start server-side upload for 1 video file ([[phab:T290685|T290685]])
* 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2007.codfw.wmnet
* 18:21 urbanecm: Start server-side upload for 1 video file ([[phab:T290707|T290707]])
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16511 and previous config saved to /var/cache/conftool/dbconfig/20210614-142032-root.json
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16510 and previous config saved to /var/cache/conftool/dbconfig/20210614-142014-root.json
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16509 and previous config saved to /var/cache/conftool/dbconfig/20210614-141720-root.json
* 18:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16508 and previous config saved to /var/cache/conftool/dbconfig/20210614-141707-root.json
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice<nowiki>{</nowiki>BannerHistory,Impression<nowiki>}</nowiki> to EventGate on testwiki - [[phab:T271168|T271168]] (duration: 00m 57s)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7620084a1ed92066aa8b29fa609cf6cbb4f799ab}}: Add portrattarkiv.se to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T290581|T290581]]) (duration: 01m 05s)
* 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2007.codfw.wmnet
* 17:39 mutante: thumbor - running puppet on all thumbor hosts, removed cron job systemd-thumbor-tmpfiles-clean, added thumbor_systemd_tmpfiles_clean timer job
* 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2006.codfw.wmnet
* 16:56 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3] (duration: 06m 15s)
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16507 and previous config saved to /var/cache/conftool/dbconfig/20210614-140529-root.json
* 16:50 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0f7f6f3]
* 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16506 and previous config saved to /var/cache/conftool/dbconfig/20210614-140511-root.json
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3] (duration: 00m 07s)
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16505 and previous config saved to /var/cache/conftool/dbconfig/20210614-140217-root.json
* 16:47 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3] (thin): Regular analytics weekly train THIN [analytics/refinery@0f7f6f3]
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16504 and previous config saved to /var/cache/conftool/dbconfig/20210614-140203-root.json
* 16:45 joal@deploy1002: Finished deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3] (duration: 19m 43s)
* 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2006.codfw.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5002.eqsin.wmnet
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16503 and previous config saved to /var/cache/conftool/dbconfig/20210614-135456-root.json
* 16:26 joal@deploy1002: Started deploy [analytics/refinery@0f7f6f3]: Regular analytics weekly train [analytics/refinery@0f7f6f3]
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16502 and previous config saved to /var/cache/conftool/dbconfig/20210614-135025-root.json
* 16:19 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5002.eqsin.wmnet
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16501 and previous config saved to /var/cache/conftool/dbconfig/20210614-135007-root.json
* 16:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum5001.eqsin.wmnet
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16500 and previous config saved to /var/cache/conftool/dbconfig/20210614-134713-root.json
* 16:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum5001.eqsin.wmnet
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16499 and previous config saved to /var/cache/conftool/dbconfig/20210614-134700-root.json
* 15:56 urbanecm: Remove 2FA for User:Rho at wikitech, identity verified via a videocall
* 13:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 14:50 moritzm: installing lz4 security updates on stretch
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16498 and previous config saved to /var/cache/conftool/dbconfig/20210614-133953-root.json
* 13:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16497 and previous config saved to /var/cache/conftool/dbconfig/20210614-133801-marostegui.json
* 13:33 ottomata: pointing <nowiki>{</nowiki>stats,analytics<nowiki>}</nowiki>.wikimedia.org at analytics-web.discovery.wmnet cname - [[phab:T285355|T285355]]
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16496 and previous config saved to /var/cache/conftool/dbconfig/20210614-133503-root.json
* 13:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4002.ulsfo.wmnet
* 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16495 and previous config saved to /var/cache/conftool/dbconfig/20210614-133442-root.json
* 13:18 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4002.ulsfo.wmnet
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16494 and previous config saved to /var/cache/conftool/dbconfig/20210614-133210-root.json
* 13:15 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4001.ulsfo.wmnet
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16493 and previous config saved to /var/cache/conftool/dbconfig/20210614-133156-root.json
* 13:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum4001.ulsfo.wmnet
* 13:29 effie: restart memcached on codfw
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16492 and previous config saved to /var/cache/conftool/dbconfig/20210614-132449-root.json
* 11:41 marostegui: Install 10.4.21-2 on db1125
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312 db1170:3317 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16491 and previous config saved to /var/cache/conftool/dbconfig/20210614-132235-marostegui.json
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16490 and previous config saved to /var/cache/conftool/dbconfig/20210614-132000-root.json
* 11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16489 and previous config saved to /var/cache/conftool/dbconfig/20210614-131938-root.json
* 11:21 Lucas_WMDE: EU backport+config window done
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16488 and previous config saved to /var/cache/conftool/dbconfig/20210614-130946-root.json
* 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720983{{!}}Enable change-tags for new edits' proofread status at mulWS (T289140)]] (duration: 01m 06s)
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16487 and previous config saved to /var/cache/conftool/dbconfig/20210614-130723-marostegui.json
* 11:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16486 and previous config saved to /var/cache/conftool/dbconfig/20210614-130547-root.json
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:583407{{!}}Don’t check constraints on two property qualifiers (T235292)]] (duration: 01m 11s)
* 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16485 and previous config saved to /var/cache/conftool/dbconfig/20210614-130435-root.json
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16484 and previous config saved to /var/cache/conftool/dbconfig/20210614-125442-root.json
* 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16483 and previous config saved to /var/cache/conftool/dbconfig/20210614-125043-root.json
* 09:55 effie: depool wtp1026
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16482 and previous config saved to /var/cache/conftool/dbconfig/20210614-124931-root.json
* 09:54 effie: depooling mw1312 and mw1319
* 12:37 XioNoX: configure OSPF link-protection on cr3/4-ulsfo - [[phab:T167306|T167306]]
* 09:46 topranks: Disabling Intel X710 NIC on-board LLDP processing on relforge1003 ([[phab:T290984|T290984]])
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16481 and previous config saved to /var/cache/conftool/dbconfig/20210614-123539-root.json
* 07:04 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16480 and previous config saved to /var/cache/conftool/dbconfig/20210614-123512-marostegui.json
* 06:57 elukey: shutdown ms-be2045 (again) after seeing [[phab:T290881|T290881]]
* 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16479 and previous config saved to /var/cache/conftool/dbconfig/20210614-123427-root.json
* 06:02 elukey: powercycle ms-be2045 - no ssh, no remote tty available
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1028 original weight', diff saved to https://phabricator.wikimedia.org/P16478 and previous config saved to /var/cache/conftool/dbconfig/20210614-122322-marostegui.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Restore db1109 original load', diff saved to https://phabricator.wikimedia.org/P17274 and previous config saved to /var/cache/conftool/dbconfig/20210915-052802-marostegui.json
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to es1028 while es1034 gets upgraded', diff saved to https://phabricator.wikimedia.org/P16477 and previous config saved to /var/cache/conftool/dbconfig/20210614-122242-marostegui.json
* 04:30 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17273 and previous config saved to /var/cache/conftool/dbconfig/20210915-043053-marostegui.json
* 12:22 dcausse: re-pooling wdqs1012
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16476 and previous config saved to /var/cache/conftool/dbconfig/20210614-122212-marostegui.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16475 and previous config saved to /var/cache/conftool/dbconfig/20210614-122036-root.json
* 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2005.codfw.wmnet
* 12:17 XioNoX: configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - [[phab:T167306|T167306]]
* 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2005.codfw.wmnet
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P16474 and previous config saved to /var/cache/conftool/dbconfig/20210614-121101-marostegui.json
* 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16473 and previous config saved to /var/cache/conftool/dbconfig/20210614-121031-marostegui.json
* 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2004.codfw.wmnet
* 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2004.codfw.wmnet
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16472 and previous config saved to /var/cache/conftool/dbconfig/20210614-120112-marostegui.json
* 11:28 effie: restart memcached on mc2019
* 11:09 effie: restart memcached on codfw memcached gutter pool (mc-gp2* hosts)
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2003.codfw.wmnet
* 10:52 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to all internal/Confed BGP groups on CR routers.
* 10:46 effie: enable puppet on mc*
* 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2003.codfw.wmnet
* 10:39 effie: disable puppet on mc* hosts
* 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2001.codfw.wmnet
* 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2001.codfw.wmnet
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16471 and previous config saved to /var/cache/conftool/dbconfig/20210614-101839-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16469 and previous config saved to /var/cache/conftool/dbconfig/20210614-100336-root.json
* 09:56 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 (duration: 02m 37s)
* 09:54 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16467 and previous config saved to /var/cache/conftool/dbconfig/20210614-094832-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16466 and previous config saved to /var/cache/conftool/dbconfig/20210614-093329-root.json
* 09:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P16465 and previous config saved to /var/cache/conftool/dbconfig/20210614-092234-marostegui.json
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16464 and previous config saved to /var/cache/conftool/dbconfig/20210614-092125-root.json
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16463 and previous config saved to /var/cache/conftool/dbconfig/20210614-090622-root.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16462 and previous config saved to /var/cache/conftool/dbconfig/20210614-085118-root.json
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16461 and previous config saved to /var/cache/conftool/dbconfig/20210614-083614-root.json
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P16460 and previous config saved to /var/cache/conftool/dbconfig/20210614-081239-marostegui.json
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16459 and previous config saved to /var/cache/conftool/dbconfig/20210614-081031-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P16458 and previous config saved to /var/cache/conftool/dbconfig/20210614-080552-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16456 and previous config saved to /var/cache/conftool/dbconfig/20210614-075528-root.json
* 07:51 marostegui: Depool clouddb1013 to upgrade mysql
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16455 and previous config saved to /var/cache/conftool/dbconfig/20210614-074024-root.json
* 07:30 marostegui: Reboot db2148 [[phab:T284852|T284852]]
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 [[phab:T284852|T284852]]', diff saved to https://phabricator.wikimedia.org/P16454 and previous config saved to /var/cache/conftool/dbconfig/20210614-072930-marostegui.json
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16453 and previous config saved to /var/cache/conftool/dbconfig/20210614-072520-root.json
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P16452 and previous config saved to /var/cache/conftool/dbconfig/20210614-071839-marostegui.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16451 and previous config saved to /var/cache/conftool/dbconfig/20210614-071742-root.json
* 07:15 dcausse: restart blazegraph and depool wdqs1012
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16450 and previous config saved to /var/cache/conftool/dbconfig/20210614-070238-root.json
* 07:01 moritzm: restarting mw canaries to pick up libwebp security updates
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16449 and previous config saved to /var/cache/conftool/dbconfig/20210614-064734-root.json
* 06:39 moritzm: installing libwep security updates on buster
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16448 and previous config saved to /var/cache/conftool/dbconfig/20210614-063231-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P16447 and previous config saved to /var/cache/conftool/dbconfig/20210614-062554-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16446 and previous config saved to /var/cache/conftool/dbconfig/20210614-061226-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16445 and previous config saved to /var/cache/conftool/dbconfig/20210614-060119-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16444 and previous config saved to /var/cache/conftool/dbconfig/20210614-055723-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16443 and previous config saved to /var/cache/conftool/dbconfig/20210614-054615-root.json
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16442 and previous config saved to /var/cache/conftool/dbconfig/20210614-054219-root.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16441 and previous config saved to /var/cache/conftool/dbconfig/20210614-053112-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16440 and previous config saved to /var/cache/conftool/dbconfig/20210614-052715-root.json
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P16439 and previous config saved to /var/cache/conftool/dbconfig/20210614-051930-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16438 and previous config saved to /var/cache/conftool/dbconfig/20210614-051608-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P16437 and previous config saved to /var/cache/conftool/dbconfig/20210614-051522-marostegui.json


== 2021-06-12 ==
== 2021-09-14 ==
* 13:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
* 23:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Re-enable VipsScaler (2 of 2) (duration: 01m 04s)
* 13:49 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
* 22:59 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable VipsScaler (1 of 2) (duration: 01m 05s)
* 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:43 legoktm: legoktm@cumin2001:~$ sudo systemctl reset-failed # clear httpbb_hourly_tests failure, moved to cumin1001
* 22:34 legoktm@deploy1002: Finished scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]]) (duration: 23m 49s)
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:11 legoktm@deploy1002: Started scap: Rebuild i18n for redeployment of VipsScaler ([[phab:T290759|T290759]])
* 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:20 dancy: testing upcoming Scap release on beta
* 20:20 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720387{{!}}Early adopt wgIncludejQueryMigrate=false on nlwiki (T280944)]] (duration: 01m 48s)
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data lyfcttm2lhw4
* 20:06 cdanis: [[phab:T290425|T290425]] ✔️ cdanis@alert1001.wikimedia.org ~ 🕓🍵 sudo /usr/bin/statograph -c /etc/statograph/config.yml erase_metric_data h5mvbny28713
* 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.23
* 18:48 moritzm: removed filter for tcp/25 on mx2001, reimage is complete [[phab:T286911|T286911]]
* 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2982638039720107d0b6e3227f5dce5b34ce7533}}: Offer the DiscussionTools reply tool as opt-out setting at ptwikinews ([[phab:T285162|T285162]]) (duration: 01m 06s)
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7f1de32f4b5788e92291a5448563bc61a9f561e2}}: Offer the DiscussionTools reply tool as opt-out setting at Wikimania wiki ([[phab:T284339|T284339]]) (duration: 01m 05s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e36f4d3dcc368f0afbce3649ce72f2135ab1c76f}}: DiscussionTools: Make newtopictool available to everyone on arwiki and cswiki ([[phab:T285724|T285724]]) (duration: 01m 04s)
* 18:09 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|Idef64e72}} (duration: 01m 29s)
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 17:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: reimage
* 17:45 moritzm: reimaging mx2001 to bullseye [[phab:T286911|T286911]]
* 16:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 15:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 15:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 37 hosts
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.remove-downtime for 37 hosts
* 15:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-update-tendril (exit_code=0)
* 15:11 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-update-tendril
* 15:10 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
* 15:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
* 15:06 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
* 15:05 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1109 load', diff saved to https://phabricator.wikimedia.org/P17271 and previous config saved to /var/cache/conftool/dbconfig/20210914-150458-marostegui.json
* 15:03 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:00 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:58 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17270 and previous config saved to /var/cache/conftool/dbconfig/20210914-145522-marostegui.json
* 14:54 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:54 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:53 jelto@cumin2002: END (ERROR) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=97)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1109 load', diff saved to https://phabricator.wikimedia.org/P17269 and previous config saved to /var/cache/conftool/dbconfig/20210914-145324-marostegui.json
* 14:52 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:49 jelto@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=99)
* 14:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:49 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 14:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:46 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:46 jelto@cumin2002: MediaWiki read-only period ends at: 2021-09-14 14:46:30.570035
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:45 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:45 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:44 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:44 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:43 jelto@cumin2002: MediaWiki read-only period starts at: 2021-09-14 14:43:48.272827
* 14:43 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 14:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: DC switchover
* 14:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: DC switchover
* 14:39 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:39 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:34 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:32 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:30 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 14:24 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:22 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:22 jelto@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:10 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Avoid warning about undefined $wgFileBlacklist ([[phab:T290640|T290640]]) (duration: 01m 32s)
* 13:44 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 13:43 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 13:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names (duration: 00m 14s)
* 13:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@79bc0c6]: geoshapes: update table names
* 13:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15 (duration: 00m 10s)
* 13:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: kartotherian: restore v4 maxzoom to z15
* 13:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@1ebdca4]: (no justification provided) (duration: 00m 15s)
* 13:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@1ebdca4]: (no justification provided)
* 12:32 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:32 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 12:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:29 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 12:19 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 12:17 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 12:17 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 10:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.20 (duration: 01m 48s)
* 09:47 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.19 (duration: 04m 13s)
* 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 09:38 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.23 (duration: 70m 39s)
* 09:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 09:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 09:09 Emperor: swift rebalance to remove h/w faulty host ms-be2045 [[phab:T290881|T290881]]
* 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:47 moritzm: installing testvm2002
* 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 08:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 08:27 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.23
* 08:25 godog: poweroff ms-be2045 and set it as failed in netbox - [[phab:T290881|T290881]]
* 08:24 hashar: train: applied security patches for 1.37.0-wmf.23  # [[phab:T281164|T281164]]
* 08:05 godog: wipe non-os partitions from ms-be2045 - [[phab:T290881|T290881]]
* 07:50 vgutierrez: update acme-chief to version 0.31 on acmechief hosts - [[phab:T290249|T290249]]
* 04:47 eileen: civicrm revision changed from {{Gerrit|1f071f6c6c}} to {{Gerrit|e6bf81d99c}}, config revision is {{Gerrit|23eda8ba3a}}
* 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 James_F: wmf/1.37.0-wmf.23 was branched at {{Gerrit|ea72c9b690c2159a12beec2f518b61cc499ed521}} for [[phab:T281164|T281164]]
* 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2021-06-11 ==
== 2021-09-13 ==
* 23:37 mutante: removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades
* 23:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:08 brennen: gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not.
* 23:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 23:45 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: III – Don't set wmgUseVips, now ignored (duration: 00m 58s)
* 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
* 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
* 23:41 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T290759|T290759]]: Undeploy VipsScaler: II – Don't load regardless of config (duration: 00m 58s)
* 20:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 19:52 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T290759|T290759]] Undeploy VipsScaler: I – Disable on all wikis (duration: 00m 57s)
* 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
* 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 19:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
* 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 18:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
* 18:59 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript resetAuthenticationThrottle.php --wiki=<nowiki>{</nowiki>cswiki,cswikiversity<nowiki>}</nowiki> --signup --ip=185.47.223.49 # [[phab:T290809|T290809]]
* 15:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s)
* 18:58 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: {{Gerrit|9db1d1ac938ca053c82fed88c8b6e75f97a52416}}: Add throttle rule for Czech wiki course ([[phab:T290809|T290809]]) (duration: 00m 58s)
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json
* 18:29 ryankemper: [Cirrus] `eqiad` fully recovered (100% of shards), `codfw` at 99.816%. `codfw` is getting held up by recovery of `enwiki` shards which tend to be quite large
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json
* 18:25 razzi: reenable replication on dbstore1007 for [[phab:T290841|T290841]]
* 14:44 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s)
* 18:16 cwhite: apply high log volume from ES mitigations to deprecated inputs
* 14:44 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
* 18:13 razzi: razzi@dbstore1007:~$ sudo systemctl restart mariadb@s3.service for [[phab:T290841|T290841]]
* 14:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s)
* 18:05 razzi: sudo systemctl restart mariadb@s2.service
* 14:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
* 17:48 ryankemper: [Cirrus] `eqiad` is at 99.13% shards recovered and `codfw` is at 98.83%
* 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 17:20 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 17:17 ryankemper: [Cirrus] `enwiki` searches appear to be working now. `production-search-eqiad` is at 93.5% recovered shards, `production-search-codfw` is at 95.3% recovered
* 14:35 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:57 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 14:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:18 legoktm@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=eventgate-main
* 14:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:16 volans@cumin1001: conftool action : set/pooled=yes; selector: name=mw1414.*
* 14:34 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:08 volans@cumin1001: conftool action : set/pooled=no; selector: name=mw1414.*
* 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 16:06 volans@cumin1001: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host mw1414.eqiad.wmnet
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:54 moritzm: filtered mx2001 on the routers for reimage [[phab:T286911|T286911]]
* 14:33 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:43 vgutierrez: update acme-chief to version 0.31 on acmechief-test hosts - [[phab:T290249|T290249]]
* 14:32 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:40 vgutierrez: upload acme-chief 0.31 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 14:31 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:32 jelto: Traffic: depool codfw from user traffic
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json
* 15:26 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 14:22 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:25 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 14:22 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:25 volans@cumin1001: START - Cookbook sre.experimental.reimage for host mw1414.eqiad.wmnet
* 14:20 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:20 Emperor: rebooting ms-be2045 to see if that brings the disk back properly [[phab:T290881|T290881]]
* 14:20 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:13 jelto@cumin2002: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=restbase-async
* 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 15:13 legoktm: (cotd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json
* 15:13 rzl: (contd.) box-constraints{{!}}similar-users{{!}}termbox{{!}}thanos-query{{!}}thanos-swift{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero)
* 13:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 15:12 jelto@cumin2002: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}eventstreams-internal{{!}}kartotherian{{!}}linkrecommendation{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}shellbox{{!}}shell
* 13:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 15:02 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 13:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
* 15:02 topranks: Restarting unused line-card FPC 1 in cr2-codfw in attempt to clear alarm.
* 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json
* 14:56 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json
* 14:44 herron: drained mx2001 mail queue to mx1001 [[phab:T286911|T286911]]
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json
* 14:38 dcausse: restarting wdqs-updater.service on all wdqs servers
* 10:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 14:21 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 07:29 moritzm: restarting archiva to pick up OpenJDK security updates
* 14:20 jelto@cumin2002: START - Cookbook sre.switchdc.services.02-restore-ttl
* 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
* 14:13 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
* 14:13 legoktm: (cotd.) ternal, eventgate-main, wikifeeds, eventstreams-internal, eventgate-analytics-external: codfw => eqiad
* 06:56 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:12 jelto@cumin2002: Switching services echostore, termbox, cxserver, eventstreams, search, ores, mathoid, schema, push-notifications, thanos-swift, wdqs, sessionstore, restbase, wdqs-internal, apertium, eventgate-analytics, citoid, api-gateway, restbase-async, proton, linkrecommendation, thanos-query, shellbox, kartotherian, mobileapps, recommendation-api, zotero, similar-users, shellbox-constraints, eventgate-logging-ex
* 05:56 elukey: rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run
* 14:12 jelto@cumin2002: START - Cookbook sre.switchdc.services.01-switch-dc
* 05:47 elukey: run systemctl reset-failed ifup@en5.service on doh1001 - [[phab:T273026|T273026]]
* 14:11 jelto@cumin2002: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 01:10 eileen: process-control config revision is {{Gerrit|2aed6ff89b}}
* 14:05 jelto@cumin2002: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3002.esams.wmnet
* 13:51 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3002.esams.wmnet
* 13:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3001.esams.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum3001.esams.wmnet
* 13:36 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2002.codfw.wmnet
* 13:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2002.codfw.wmnet
* 13:20 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum2001.codfw.wmnet
* 13:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum2001.codfw.wmnet
* 12:09 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:03 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 kostajh: European mid-day backport window deploys done
* 11:24 kharlan@deploy1002: Synchronized wmf-config: Config: [[gerrit:713553{{!}}WikimediaEvents: Remove UnderstandingFirstDay config]] (duration: 00m 59s)
* 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 10:43 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 10:15 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=93) for host mw1414.eqiad.wmnet
* 09:33 volans: restarting tcpircbot-logmsgbot on alert1001, not relying messages
* 09:18 elukey: upgrade rsyslog* on ml-serve* nodes to 8.1901.0-1+wmf2
* 09:16 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 09:11 moritzm: reimaging sretest1002
* 09:11 elukey: upload rsyslog* 8.1901.0-1+wmf2 to buster-wikimedia component/rsyslog-k8s - [[phab:T277739|T277739]]
* 08:16 godog: bump +100G prometheus/ops codfw


== 2021-06-10 ==
== 2021-09-12 ==
* 23:29 derick@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Citoid/modules/ve/ve.ui.CitoidInspector.js: Backport: [[gerrit:699288{{!}}CitoidInspector: rename getParameterNames to getOrderedParameterNames (T284786)]] (duration: 00m 57s)
* 18:33 vgutierrez: restart varnish-fe on cp3061, cp3063 and cp3065
* 21:40 urbanecm: End of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 18:29 vgutierrez: restart varnish on cp3055
* 21:36 urbanecm: Start of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # [[phab:T282699|T282699]]
* 18:26 vgutierrez: restart varnish on cp3057
* 21:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki discussiontools # [[phab:T282699|T282699]]
* 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:13 mutante: installed tftp client on install1003 for debugging
* 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 jhuneidi@deploy1002: Pruned MediaWiki: 1.37.0-wmf.5 (duration: 03m 33s)
* 19:31 ryankemper: [[phab:T265547|T265547]] Cleanup following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/698025: `sudo -E cumin -b 5 'P:analytics::cluster::elasticsearch' 'sudo rm -rfv /etc/mjolnir /srv/deployment/search/mjolnir'`
* 19:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 18:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikimediaMaintenance/dumpInterwiki.php: {{Gerrit|b21904e326e917f5ac6d7129a4d224380c6e4c21}}: Remove sep11 interwiki link from dumpinterwiki.php (duration: 01m 08s)
* 18:45 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 23s)
* 18:39 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 03s)
* 18:38 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|8aeab139879613782548b20fc11af5e66589e30a}}: Fire language change hook ([[phab:T280770|T280770]]) (duration: 01m 07s)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|d26968c1c3b3f3e115ff37a9a138d225cabba25a}}: wgWelcomeSurveyExperimentalGroups: Use new syntax in CS.php ([[phab:T284597|T284597]]; [[phab:T284735|T284735]]) (duration: 01m 08s)
* 17:11 moritzm: updating bullseye installer image to latest daily image (kernel ABI changed again) [[phab:T275873|T275873]]
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:06 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:53 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
* 16:51 moritzm: installing rails security updates
* 16:37 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta {{Gerrit|I2a42c222003}} (duration: 01m 07s)
* 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:24 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
* 15:09 papaul: power down ms-be2038 for BBU replacement
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16417 and previous config saved to /var/cache/conftool/dbconfig/20210610-123201-root.json
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16416 and previous config saved to /var/cache/conftool/dbconfig/20210610-121657-root.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16415 and previous config saved to /var/cache/conftool/dbconfig/20210610-120153-root.json
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16414 and previous config saved to /var/cache/conftool/dbconfig/20210610-114650-root.json
* 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16413 and previous config saved to /var/cache/conftool/dbconfig/20210610-113146-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16412 and previous config saved to /var/cache/conftool/dbconfig/20210610-111643-root.json
* 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16411 and previous config saved to /var/cache/conftool/dbconfig/20210610-110139-root.json
* 11:00 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 53s)
* 10:59 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next
* 10:47 topranks: [[phab:T283163|T283163]]: Adding "metric-out minimum-igp" to BGP group Confed_eqord on eqiad, codfw and eqdfw CRs.
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16410 and previous config saved to /var/cache/conftool/dbconfig/20210610-104635-root.json
* 10:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikiEditor/modules/jquery.wikiEditor.js: {{Gerrit|8a17c43c5470b84ba58239bb2cf947dbebf1979f}}: Fix call to renamed var ([[phab:T284716|T284716]]) (duration: 01m 25s)
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16409 and previous config saved to /var/cache/conftool/dbconfig/20210610-103132-root.json
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16408 and previous config saved to /var/cache/conftool/dbconfig/20210610-103032-marostegui.json
* 10:29 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:28 kormat: running optimize tables against pc1009 (pc3) [[phab:T282761|T282761]]
* 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:21 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16407 and previous config saved to /var/cache/conftool/dbconfig/20210610-101858-root.json
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16406 and previous config saved to /var/cache/conftool/dbconfig/20210610-100355-root.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16405 and previous config saved to /var/cache/conftool/dbconfig/20210610-094851-root.json
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16404 and previous config saved to /var/cache/conftool/dbconfig/20210610-093346-root.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16402 and previous config saved to /var/cache/conftool/dbconfig/20210610-093003-marostegui.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16401 and previous config saved to /var/cache/conftool/dbconfig/20210610-092246-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16399 and previous config saved to /var/cache/conftool/dbconfig/20210610-091842-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16398 and previous config saved to /var/cache/conftool/dbconfig/20210610-090345-root.json
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16397 and previous config saved to /var/cache/conftool/dbconfig/20210610-090339-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16396 and previous config saved to /var/cache/conftool/dbconfig/20210610-084841-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16395 and previous config saved to /var/cache/conftool/dbconfig/20210610-084835-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16394 and previous config saved to /var/cache/conftool/dbconfig/20210610-083338-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16393 and previous config saved to /var/cache/conftool/dbconfig/20210610-083332-root.json
* 08:25 volans: uploaded spicerack_0.0.53 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16392 and previous config saved to /var/cache/conftool/dbconfig/20210610-081834-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16391 and previous config saved to /var/cache/conftool/dbconfig/20210610-081828-root.json
* 08:17 marostegui: Drop several grants from labswiki (wikitech) [[phab:T282074|T282074]]
* 07:57 jynus: reset-failed on cumin1001 after backup rerun
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P16389 and previous config saved to /var/cache/conftool/dbconfig/20210610-075702-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16388 and previous config saved to /var/cache/conftool/dbconfig/20210610-075247-marostegui.json
* 07:44 jynus: retrying s6 snapshots on eqiad, acking demon failure
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16387 and previous config saved to /var/cache/conftool/dbconfig/20210610-073727-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16386 and previous config saved to /var/cache/conftool/dbconfig/20210610-072224-root.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16385 and previous config saved to /var/cache/conftool/dbconfig/20210610-070720-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16384 and previous config saved to /var/cache/conftool/dbconfig/20210610-065217-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16383 and previous config saved to /var/cache/conftool/dbconfig/20210610-064916-root.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16382 and previous config saved to /var/cache/conftool/dbconfig/20210610-063745-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16381 and previous config saved to /var/cache/conftool/dbconfig/20210610-063412-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16380 and previous config saved to /var/cache/conftool/dbconfig/20210610-061909-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16379 and previous config saved to /var/cache/conftool/dbconfig/20210610-061806-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16378 and previous config saved to /var/cache/conftool/dbconfig/20210610-060405-root.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16377 and previous config saved to /var/cache/conftool/dbconfig/20210610-060302-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16376 and previous config saved to /var/cache/conftool/dbconfig/20210610-055327-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16375 and previous config saved to /var/cache/conftool/dbconfig/20210610-055037-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16374 and previous config saved to /var/cache/conftool/dbconfig/20210610-054802-root.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16373 and previous config saved to /var/cache/conftool/dbconfig/20210610-054759-root.json
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16372 and previous config saved to /var/cache/conftool/dbconfig/20210610-053534-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16371 and previous config saved to /var/cache/conftool/dbconfig/20210610-053259-root.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16370 and previous config saved to /var/cache/conftool/dbconfig/20210610-053255-root.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16369 and previous config saved to /var/cache/conftool/dbconfig/20210610-052421-marostegui.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16368 and previous config saved to /var/cache/conftool/dbconfig/20210610-052030-root.json
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16367 and previous config saved to /var/cache/conftool/dbconfig/20210610-052017-marostegui.json
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16366 and previous config saved to /var/cache/conftool/dbconfig/20210610-050526-root.json


== 2021-06-09 ==
== 2021-09-11 ==
* 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1002.wikimedia.org
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27814b8eaacb5ba2fee1b6167a36ea14356a1ecf}}: testwiki: Fully remove securepoll-related groups ([[phab:T290808|T290808]]) (duration: 00m 57s)
* 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 18:35 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript emptyUserGroup.php --wiki=testwiki <nowiki>{</nowiki>electionadmin,electcomm<nowiki>}</nowiki> # [[phab:T290808|T290808]]
* 21:59 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh1002.wikimedia.org
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|908bbf35235ea4129795dfbf4c0e646440152e18}}: Revert "test: Add electcomm and electionadmin groups" ([[phab:T290808|T290808]]) (duration: 00m 58s)
* 21:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
* 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1001.wikimedia.org
* 21:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1001.wikimedia.org
* 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/DiscussionTools/modules/dt-ve/CommentTargetWidget.less: Backport: [[gerrit:698681{{!}}Update surface styles for VE changes (T284567)]] (duration: 01m 14s)
* 21:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/includes/language/LanguageConverter.php: Backport: [[gerrit:699014{{!}}Revert "Add type hint to constructor of LanguageConverter" (T284685)]] (duration: 01m 24s)
* 21:08 mutante: rsyncing static-bugzilla HTML from miscweb1002 to deploy1002
* 21:00 mutante: deploy1002 - creating temp dir /srv/miscweb to rsync static-bugzilla data to, coming from miscweb1002 [[phab:T281538|T281538]]
* 20:36 mutante: deployed temp ferm change on deployment servers to let miscweb dump data, puppetized. scap pull from mwdebug1001 works, deployment good to go
* 19:08 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 01m 07s)
* 19:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 18:07 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php (foreachwiki)
* 17:52 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php --wiki rmywiki
* 17:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudmetrics1002.eqiad.wmnet
* 17:32 aborrero@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudmetrics1002.eqiad.wmnet
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 17:16 jayme: updated python3-docker-report to 0.0.12 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,deneb.codfw.wmnet,registry[2003-2008].codfw.wmnet,registry[1003-1004].eqiad.wmnet
* 16:35 jayme: import docker-report 0.0.12 into buster-wikimedia
* 15:37 hnowlan: rebuilding maps2009 as buster master
* 15:08 vgutierrez: restarting acme-chief on acmechief1001
* 15:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
* 15:01 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 55s)
* 15:00 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:57 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 04s)
* 14:57 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:51 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 15s)
* 14:50 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
* 14:45 moritzm: installing postgresql 9.6 security updates on stretch
* 14:37 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - [[phab:T282562|T282562]] (duration: 01m 06s)
* 14:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on all wikis - [[phab:T282855|T282855]] (duration: 01m 06s)
* 14:23 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on testwiki - [[phab:T282855|T282855]] (duration: 01m 07s)
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16358 and previous config saved to /var/cache/conftool/dbconfig/20210609-141807-root.json
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=0; selector: name=maps2009.codfw.wmnet
* 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:59 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - [[phab:T282562|T282562]] (duration: 01m 08s)
* 13:56 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki1001 - [[phab:T282469|T282469]]
* 13:54 XioNoX: Add Routinator 3000 0.9.0 to the APT repo - [[phab:T282469|T282469]]
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16356 and previous config saved to /var/cache/conftool/dbconfig/20210609-134800-root.json
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16355 and previous config saved to /var/cache/conftool/dbconfig/20210609-133257-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16354 and previous config saved to /var/cache/conftool/dbconfig/20210609-132958-marostegui.json
* 13:12 moritzm: installing nginx security updates
* 13:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 02m 26s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 00m 10s)
* 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 01m 14s)
* 13:05 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16351 and previous config saved to /var/cache/conftool/dbconfig/20210609-130114-root.json
* 12:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 12:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1 (duration: 00m 53s)
* 12:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16350 and previous config saved to /var/cache/conftool/dbconfig/20210609-124610-root.json
* 12:43 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 28s)
* 12:42 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:42 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 08s)
* 12:41 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:41 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 47s)
* 12:40 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:39 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 41s)
* 12:39 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16349 and previous config saved to /var/cache/conftool/dbconfig/20210609-123615-root.json
* 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:33 godog: lists1001:rm /var/lib/prometheus/node.d/mailman_queues.prom
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16348 and previous config saved to /var/cache/conftool/dbconfig/20210609-123106-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16347 and previous config saved to /var/cache/conftool/dbconfig/20210609-122111-root.json
* 12:18 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 03m 38s)
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16345 and previous config saved to /var/cache/conftool/dbconfig/20210609-121603-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16344 and previous config saved to /var/cache/conftool/dbconfig/20210609-121501-marostegui.json
* 12:14 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:13 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 53s)
* 12:12 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 12:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 44s)
* 12:09 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:09 hnowlan: running `nodetool decommission` on maps2009
* 12:06 hnowlan: stopped tilerator on maps2009
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16343 and previous config saved to /var/cache/conftool/dbconfig/20210609-120608-root.json
* 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
* 12:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
* 12:03 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 06s)
* 12:03 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 12:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac43baa}}: {{Gerrit|d185728}}: WelcomeSurveyExperimentalGroups: Use new syntax ([[phab:T284599|T284599]]) (duration: 01m 19s)
* 11:59 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 54s)
* 11:58 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:54 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 41s)
* 11:54 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:53 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 03m 11s)
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16342 and previous config saved to /var/cache/conftool/dbconfig/20210609-115104-root.json
* 11:50 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:49 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 02m 16s)
* 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16341 and previous config saved to /var/cache/conftool/dbconfig/20210609-114944-marostegui.json
* 11:47 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 05s)
* 11:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:46 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 53s)
* 11:45 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
* 11:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: redeploy HEAD~1 (duration: 01m 55s)
* 11:38 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: redeploy HEAD~1
* 11:36 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1 (duration: 00m 54s)
* 11:35 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1
* 11:34 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 02m 23s)
* 11:32 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:32 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 00m 59s)
* 11:31 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
* 11:27 jbond: drop keep_env from sudo config - #[[phab:T275852|T275852]]
* 11:22 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 43s)
* 11:22 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:21 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 15s)
* 11:20 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
* 11:11 awight: EU deployment window complete
* 11:10 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698855{{!}}Set wgAutoConfirmCount to 10 for enwikisource (T284627)]] (duration: 02m 04s)
* 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
* 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 53s)
* 10:14 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 05m 41s)
* 10:07 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:06 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 38s)
* 10:06 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 [[phab:T283235|T283235]]', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json
* 10:00 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 48s)
* 09:59 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 09:58 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on schema* after switch towards nginx-light [[phab:T164456|T164456]]
* 07:54 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:16 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:26 XioNoX: Add 185.71.138.0/24 to network::external and diffscan - [[phab:T252132|T252132]]
* 06:12 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16334 and previous config saved to /var/cache/conftool/dbconfig/20210609-053213-root.json
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16333 and previous config saved to /var/cache/conftool/dbconfig/20210609-051710-root.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16332 and previous config saved to /var/cache/conftool/dbconfig/20210609-050206-root.json
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16331 and previous config saved to /var/cache/conftool/dbconfig/20210609-044703-root.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index [[phab:T163532|T163532]]', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json
* 04:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 03:30 eileen: civicrm revision changed from {{Gerrit|eac772e9c9}} to {{Gerrit|31d07115a0}}, config revision is {{Gerrit|931a941a5e}}
* 03:01 Amir1: mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary  ([[phab:T284444|T284444]])
* 02:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:56 Amir1: clean up of the rest of mbox files (except arbcom) ([[phab:T282303|T282303]])
* 02:55 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 02:49 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "xfer categories following reimage" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 02:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:39 ryankemper: [[phab:T280382|T280382]] Re-enabled puppet on `wdqs1010`
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:698654{{!}}Enable Wikisource OCR on select Wikisources (T283898)]] (duration: 01m 31s)
* 00:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-06-08 ==
== 2021-09-10 ==
* 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
* 21:28 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
* 21:27 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 22:21 ryankemper: [[phab:T284479|T284479]] Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
* 21:21 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 22:15 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
* 20:46 jhuneidi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:14 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
* 20:44 jhuneidi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:10 ryankemper: [[phab:T284479|T284479]] Yup more than enough evidence of a strong upward spike now. Proceeding to revert
* 20:42 jhuneidi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 22:10 ryankemper: [[phab:T284479|T284479]] Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
* 18:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 22:09 ryankemper: [[phab:T284479|T284479]] Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
* 18:08 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 22:03 ryankemper: [[phab:T284479|T284479]] Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
* 17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 22:01 ryankemper: [[phab:T284479|T284479]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
* 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2005.codfw.wmnet with reason: REIMAGE
* 21:59 ryankemper: [[phab:T284479|T284479]] Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
* 16:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: REIMAGE
* 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
* 16:14 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 21:29 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 16:03 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 21:27 ryankemper: [[phab:T280382|T280382]] Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
* 15:39 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:27 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
* 14:48 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
* 14:43 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
* 13:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 09:31 XioNoX: push pfw policies - [[phab:T290611|T290611]]
* 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
* 09:07 mutante: planet - deleted all state files for all languages, running fresh update via systemctl start for all languages after proxy changes ([[phab:T285251|T285251]])
* 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 08:37 jynus: upgrade and restart db2139
* 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:36 ryankemper: [[phab:T280382|T280382]] Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: [[gerrit:698848{{!}}Fix MonoBook orange banner hover styles (T284496)]] (duration: 01m 08s)
* 08:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
* 08:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:20 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
* 08:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 07:58 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-workers - [[phab:T289766|T289766]]
* 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 07:57 moritzm: installing ntfs-3g security updates
* 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 07:46 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:18 ryankemper: [[phab:T280382|T280382]] `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
* 07:45 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
* 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
* 07:25 jayme: updating rsyslog to 8.1901.0-1~bpo9+wmf2 on kubernetes-staging - [[phab:T289766|T289766]]
* 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:19 jayme: importes rsyslog 8.1901.0-1~bpo9+wmf2 to stretch-wikimedia - [[phab:T289766|T289766]]
* 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 06:56 effie: disable puppet on deploy1002 and mw2254
* 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 06:29 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr<nowiki>{</nowiki>1,2<nowiki>}</nowiki>-eqiad
* 06:27 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]] (duration: 34m 12s)
* 06:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9  refs [[phab:T281150|T281150]]
* 06:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 16:27 papaul: powerdown  moss-fe2002  for relocation
* 06:02 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2280.codfw.wmnet
* 16:06 papaul: powerdown  ms-backup2002  for relocation
* 05:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:56 elukey: powercycle mw2280 - no tty available in mgmt, no ssh, host frozen
* 15:40 papaul: powerdown ms-be2061 for relocation
* 05:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
* 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 05:54 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 papaul: powerdown thanos-fe2003 for relocation
* 05:45 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, [[phab:T280605|T280605]], [[phab:T282761|T282761]].
* 05:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 05:12 marostegui: Repool clouddb1017:3311
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 [[phab:T282761|T282761]]
* 05:12 marostegui: Repool clouddb1013:3311
* 15:13 papaul: powerdown cp2034 for relocation
* 04:49 marostegui: Depool clouddb1013:3311
* 15:04 papaul: powerdown cp2033 for relocation
* 04:49 marostegui: Depool clouddb1017:3311
* 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 02:52 eileen: civicrm revision changed from {{Gerrit|83f514f693}} to {{Gerrit|1f071f6c6c}}, config revision is {{Gerrit|23eda8ba3a}}
* 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light  [[phab:T164456|T164456]]
* 00:35 tgr: Deployed patch for [[phab:T290692|T290692]]
* 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters [[phab:T284106|T284106]]
* 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 14:05 kormat: setting pc1010 as pc3 primary [[phab:T282761|T282761]]
* 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 42s)
* 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 00m 47s)
* 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next  (duration: 01m 03s)
* 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
* 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master [[phab:T282761|T282761]] (duration: 00m 57s)
* 12:14 kormat: setting pc1008 back as pc2 primary [[phab:T282761|T282761]]
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef49422b162ab0161bc39da857b3230175ac4492}}: enwiki: Disable indexing on the Book namespace ([[phab:T283522|T283522]]) (duration: 00m 56s)
* 11:46 urbanecm: Start server-side upload for 1 file ([[phab:T283470|T283470]])
* 11:45 moritzm: installing nginx security updates on buster
* 11:43 urbanecm: Start server-side upload for 2 files ([[phab:T283645|T283645]], [[phab:T283583|T283583]])
* 11:39 urbanecm: EU B&C deployment done
* 11:38 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16329 and previous config saved to /var/cache/conftool/dbconfig/20210608-113857-kormat.json
* 11:38 moritzm: installing ruby-nokogiri security updates
* 11:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/WikimediaEvents/: {{Gerrit|b0b46530b731d2a5f17b0aa04a4cf99df175e23d}}: universalLanguageSelector: Add missing properties ([[phab:T280770|T280770]]) (duration: 00m 56s)
* 11:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: {{Gerrit|5df13eeae3b52b98eaf3fdb99ddfa5a0f7b2b1e4}}: Pass context to compact_language_links.open hook ([[phab:T280770|T280770]]) (duration: 00m 57s)
* 11:23 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16328 and previous config saved to /var/cache/conftool/dbconfig/20210608-112354-kormat.json
* 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 3/3) (duration: 00m 58s)
* 11:13 urbanecm@deploy1002: Synchronized wmf-config/config/lvwiki.yaml: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 2/3) (duration: 00m 56s)
* 11:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73dc708efc25caa667be516c685885db3983be73}}: lvwiki: Enable Growth features in dark mode ([[phab:T278191|T278191]]; 1/3) (duration: 00m 57s)
* 11:10 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=lvwiki growthexperiments # [[phab:T278191|T278191]]
* 11:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16327 and previous config saved to /var/cache/conftool/dbconfig/20210608-110850-kormat.json
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|abd401074247d1f1dd2722c2d4d06747b066d547}}: enwiki: Deploy Growth freatures to 2% of new accounts ([[phab:T281896|T281896]]) (duration: 00m 57s)
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json
* 10:50 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)
* 10:49 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 10:16 liw: testing upcoming Scap release on beta
* 10:01 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki2001 - [[phab:T282469|T282469]]
* 09:58 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)
* 09:57 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
* 09:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:04 jayme: removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - [[phab:T251918|T251918]]
* 08:31 dcausse: depooling wdqs1006 (lag)
* 08:29 dcausse: restarting blazegraph on wdqs1006
* 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:13 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:49 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
* 07:41 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 07:40 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json
* 06:52 tgr: [[phab:T283606|T283606]]: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=<nowiki>{</nowiki>ar,bn,cs,vi<nowiki>}</nowiki>wiki --verbose --search-index with gerrit:696307 applied
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json
* 06:27 elukey: clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)
* 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
* 04:54 marostegui: Repool clouddb1019:3314
* 04:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 02:38 ryankemper: [[phab:T284445|T284445]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`
* 02:37 ryankemper: [[phab:T284445|T284445]] after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)
* 02:34 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)


== 2021-06-07 ==
== 2021-09-09 ==
* 21:26 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 23:07 brennen: no takers on patches, ending backport & config training window.
* 21:12 sbassett: Deployed security patch for [[phab:T284364|T284364]]
* 21:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:30 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected
* 21:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:25 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests
* 21:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:21 ryankemper: [[phab:T284479|T284479]] [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`
* 21:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:19 cdanis: [[phab:T284479|T284479]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"
* 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:07 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]] (duration: 04m 53s)
* 20:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:02 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve [[phab:T284462|T284462]]
* 19:40 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)
* 19:37 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve
* 19:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 herron: prometheus3001: moved /srv back to vda1 filesystem [[phab:T243057|T243057]]
* 19:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:26 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=[[phab:T284149|T284149]]
* 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 2/2) (duration: 00m 57s)
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: {{Gerrit|368b5d9}}: {{Gerrit|0e79aee}}: WelcomeSurvey backports ([[phab:T284127|T284127]], [[phab:T284257|T284257]]; 1/2) (duration: 00m 56s)
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc4f20437868b39ae2cc4eac8735ecb8bcd93157}}: Growth: Push 44 wikis out of dark mode ([[phab:T289680|T289680]]) (duration: 00m 57s)
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|7089728}}: {{Gerrit|b2482fb}}: initWikiConfig GE backports ([[phab:T284072|T284072]]) (duration: 00m 58s)
* 18:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 3/3) (duration: 00m 57s)
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 3/3) (duration: 00m 56s)
* 18:22 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 2/3) (duration: 01m 01s)
* 18:14 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 2/3) (duration: 00m 56s)
* 18:21 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]; 1/3) (duration: 00m 58s)
* 18:14 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 18:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 18:14 ottomata: rolling restart of kafka jumbo brokers  - [[phab:T283067|T283067]]
* 18:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 18:13 urbanecm@deploy1002: Synchronized wmf-config/config/skwiki.yaml: {{Gerrit|15e09109b7c45de967a496a0eb58ad267dbc5079}}: skwiki: Make Growth features available in dark mode ([[phab:T284149|T284149]]; 1/3) (duration: 00m 59s)
* 18:20 urbanecm@deploy1002: sync-file aborted: {{Gerrit|6af38d951f0ef9af369e2172c175628dc6e9a281}}: Deploy Growth features in dark modes to ~200 wikis ([[phab:T290582|T290582]]) (duration: 00m 05s)
* 18:12 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 18:18 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 18:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=skwiki growthexperiments # [[phab:T284149|T284149]]
* 18:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5de2f8b27b016a2cd8f424d8e40318edde5e5704}}: Set WelcomeSurveyEnableWithHomepage ([[phab:T281896|T281896]], [[phab:T284257|T284257]]) (duration: 00m 59s)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:53 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 18:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 17:53 ottomata: rolling restart of kafka jumbo mirror makers  - [[phab:T283067|T283067]]
* 18:16 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: REIMAGE
* 17:17 ryankemper: [Cirrussearch] We're seeing ~10% of current requests being rejected by poolcounter, due to ~2x expected `eqiad.full_text` query volume and ~30x expected `eqiad.entity_full_text` query volume
* 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph locked up)
* 18:12 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php --phab=[[phab:T290582|T290582]] {{!}} tee ~/initwikiconfig.out # [[phab:T290582|T290582]]
* 16:51 razzi: run homer '*.eqiad.wmnet' diff
* 18:11 urbanecm: Run extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments for wikis in P17258 ([[phab:T290582|T290582]])
* 16:49 ottomata: restarting mysqld analytics-meta replica on db1108 to apply config change - [[phab:T272973|T272973]]
* 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6 (duration: 04m 29s)
* 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/config: no-op: {{Gerrit|76c51f2753aed9dc8e06b63de6657c3c94371a3c}}: Standardize indentation in several .yaml files (duration: 00m 58s)
* 16:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6 (duration: 00m 35s)
* 17:29 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 16:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6
* 17:28 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:57 moritzm: installing remaining lz4 security updates on buster
* 17:28 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:35 moritzm: installing isc-dhcp security updates
* 17:26 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P16315 and previous config saved to /var/cache/conftool/dbconfig/20210607-141722-marostegui.json
* 17:25 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P16314 and previous config saved to /var/cache/conftool/dbconfig/20210607-141307-marostegui.json
* 17:22 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 13:35 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3) (duration: 00m 52s)
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 13:34 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3)
* 17:21 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 13:34 moritzm: installing libxml2 security updates on stretch
* 17:21 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 13:32 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 14s)
* 17:20 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 13:31 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 13:28 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 54s)
* 17:14 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-09-09 17:14:12.502162
* 13:27 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 12:41 moritzm: removing now obsolete Java 8 packages from gerrit* [[phab:T268225|T268225]]
* 17:14 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 12:36 moritzm: removing now obsolete Java 8 packages from contint* [[phab:T268225|T268225]]
* 17:14 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 12:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 12:25 moritzm: installing nginx security updates on buster
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --add-prefix=BROKEN --fix # [[phab:T284442|T284442]]
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki # [[phab:T284442|T284442]]
* 17:13 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 11:09 Lucas_WMDE: EU backport+config window done
* 17:13 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697824{{!}}Add 2021 namespaces for wikimania wiki (T284235)]] (duration: 00m 56s)
* 17:12 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 10:48 volans: reset netbox-next DB with the latest prod dump
* 17:12 jelto@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-09-09 17:12:27.974410
* 10:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 17:12 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 10:41 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:698472{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 17:08 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 17:07 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 10:38 godog: downgrade grafana to 7.4.2 on grafana2001 - [[phab:T282863|T282863]]
* 17:07 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 17:04 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 17:04 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 10:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 16:58 jelto@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 16:58 jelto@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 10:28 kormat: reimaging db1157 [[phab:T283131|T283131]]
* 16:57 jelto: start cookbook sre.switchdc.mediawiki eqiad codfw --live-test this will generate some additional SAL logs here
* 10:24 moritzm: remove now obsolete nginx mods and dependencies on htmldumper1001 [[phab:T164456|T164456]]
* 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 16:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 16:23 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 16:10 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 16:00 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 10:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 depooling: reimage to buster [[phab:T283131|T283131]]', diff saved to https://phabricator.wikimedia.org/P16311 and previous config saved to /var/cache/conftool/dbconfig/20210607-100822-kormat.json
* 15:34 volans@cumin1001: END (FAIL) - Cookbook sre.experimental.reimage (exit_code=99) for host sretest1001.eqiad.wmnet
* 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 15:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 15:28 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:719610{{!}}pipeline: add comment redirecting to correct file]] (duration: 00m 59s)
* 09:43 moritzm: upgrading bullseye hosts to latest packages in testing
* 15:24 volans@cumin1001: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 14:47 mutante: planet - deleting all state and lock files for the "en" feeds ([[phab:T285251|T285251]] [[phab:T289984|T289984]])
* 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2002.wikimedia.org
* 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 14:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 09:03 moritzm: installing imagemagick security updates on stretch
* 14:25 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 06:05 marostegui: Upgrade mysql on dbstore1003 [[phab:T283235|T283235]]
* 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 05:57 marostegui: Stop dbstore1004 to clone dbstore1007 [[phab:T283125|T283125]]
* 14:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 05:37 marostegui: Depool clouddb1020 (s5, s8) for upgrade
* 14:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 13:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host mx2002.wikimedia.org
* 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
* 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:48 marostegui: Depool clouddb1019:3314 (long running alter table)
* 13:11 mutante: planet1002 - re-enabling disabled puppet
* 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 10:48 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 10:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 10:47 topranks: Removing peering to old IPs of AS139931 (BSCCL) at Equinix Singapore (cr3-eqsin).
* 10:45 topranks: Removing peering to AS24218 at Equinix Singapore (cr3-eqsin) - network no longer uses this ASN.
* 10:22 volans: upgrading spicerack on cumin1001
* 10:20 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 10:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2002.wikimedia.org
* 09:47 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 09:46 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:37 godog: swift eqiad add ms-be10[64-67] with initial weight - [[phab:T290546|T290546]]
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=eqiad
* 09:19 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 09:15 volans: rebooting sretest1001 to test ipmi reboot via spicerack
* 09:15 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:20:00 on sretest1001.eqiad.wmnet with reason: testing reboot via ipmi
* 09:13 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 09:09 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 08:59 godog: move swift traffic fully to codfw to rebalance eqiad - [[phab:T287539|T287539]]
* 08:59 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
* 08:58 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=codfw
* 08:56 volans: upgrading spicerack on cumin2002 to test the new release
* 08:50 volans: uploaded spicerack_0.0.59 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:23 jelto: run ansible change 719041 on gitlab1001
* 08:13 jelto: run ansible change 719041 on gitlab2001
* 07:07 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1002.eqiad.wmnet
* 06:47 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1002.eqiad.wmnet
* 04:37 ryankemper: [WDQS] Dispatched e-mail to the banned user agent (dailymotion)
* 03:57 ryankemper: [WDQS] Dispatched e-mail to WDQS public mailing list informing them the outage is over; all that's left is the e-mail to the banned UA
* 03:47 ryankemper: [WDQS] Restarting `wdqs-blazegraph` on `wdqs[2001-2008].codfw.wmnet`; if banning the dailymotion UA was sufficient then servers should come back up healthy and not drop back into deadlock
* 03:43 ryankemper: [WDQS] Running puppet agent on `wdqs[2001-2008].codfw.wmnet` to roll out https://gerrit.wikimedia.org/r/719753
* 03:29 ryankemper: [WDQS] There's no clear indication of them being a culprit, but by far the most common user agent is a dailymotion VideocatalogTopic UA (see https://logstash.wikimedia.org/goto/51f238e9010d0220e5d33c6c210be93e)
* 03:12 bstorm: attempting to start replication on clouddb1017 s1 [[phab:T290630|T290630]]
* 03:11 bstorm: stopping and restarting mariadb on clouddb1017 s1
* 03:04 ryankemper: [WDQS] Dispatched email to Wikidata public mailing list about reduced service availability
* 02:36 ryankemper: [WDQS] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&from=1631152574841&to=1631154942992 shows the availability pattern, anywhere we see missing data (null) represents time that blazegraph was locked up and therefore unable to report metrics
* 02:34 ryankemper: [WDQS] For context I glanced at `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo systemctl status wdqs-blazegraph'` before doing the aforementioned restarts and they'd all last restarted between 25-28 minutes ago
* 02:33 ryankemper: [WDQS] Restarting `wdqs-blazegraph` across all of `wdqs2*`
* 00:50 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Don't set default  to Score (try #2) (duration: 00m 58s)
* 00:48 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/Score/includes/Score.php: Use the 'score' Shellbox if configured ([[phab:T290193|T290193]]) (duration: 00m 57s)
* 00:46 legoktm@deploy1002: Synchronized php-1.37.0-wmf.21/includes/shell/CommandFactory.php: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]]) (duration: 00m 58s)
* 00:45 legoktm@deploy1002: sync-file aborted: shell: Fix $wgShellboxUrls by passing service name when creating BoxedCommand ([[phab:T290193|T290193]] (duration: 00m 07s)
* 00:15 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove putenv() for GDFONTPATH (duration: 00m 58s)


== 2021-06-05 ==
== 2021-09-08 ==
* 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now ([[phab:T282303|T282303]])
* 22:34 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/717649
* 15:21 Amir1: delete mbox files of group D and E in mm2 ([[phab:T282303|T282303]])
* 22:24 ryankemper: WDQS] [[phab:T280247|T280247]] Ran puppet-agent on `miscweb*` following merge of https://gerrit.wikimedia.org/r/c/wikidata/query/gui-deploy/+/714623
* 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:55 ryankemper: [WDQS] [[phab:T280247|T280247]] Purged varnish to make sure change took effect: `echo 'https://query-preview.wikidata.org/' {{!}} mwscript purgeList.php` and `echo 'https://query.wikidata.org/' {{!}} mwscript purgeList.php` on `mwmaint1002`
* 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
* 21:53 ryankemper: [WDQS] [[phab:T280247|T280247]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719502 and ran puppet-agent on `miscweb*`
* 00:18 mutante: backup1001 systemctl reload bacula-dir  fails
* 20:49 eileen: civicrm revision changed from {{Gerrit|593d01f4fc}} to {{Gerrit|83f514f693}}, config revision is {{Gerrit|23eda8ba3a}}
* 20:41 legoktm: Successfully published image docker-registry.discovery.wmnet/php7.2-fpm-multiversion-base:1.0.2
* 19:25 Krinkle: krinkle@mw1369 Running some benchmarks in Eqiad on load.php
* 18:27 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 2/2) (duration: 00m 58s)
* 18:26 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|6bcbe61f9a89086b775d84a81d55a7587cf26780}}: Italian Wikipedia is now a group 1 wiki ([[phab:T286664|T286664]]; 1/2) (duration: 00m 58s)
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bbefce6a3778f159ad68587c830dff4a1da0c792}}: Growth: Remove config that moved on-wiki ([[phab:T290295|T290295]]) (duration: 00m 58s)
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|950a377e5ba6f5d318135e31b36334532d9ae71b}}: Stop setting $wgAbuseFilterParserClass ([[phab:T239990|T239990]]) (duration: 00m 58s)
* 17:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2004.codfw.wmnet
* 16:53 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2004.codfw.wmnet
* 16:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2003.codfw.wmnet
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2003.codfw.wmnet
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps2001.codfw.wmnet
* 16:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|796e23c87ccfc48334ab932e13aab4f0ec746bbd}}: updateMenteeData.php: Make it possible to force update (duration: 00m 58s)
* 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719524{{!}}Turn off jQuery migrate on wikisource wikis (T280944)]] (duration: 00m 59s)
* 16:23 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps2001.codfw.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 16:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
* 15:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
* 15:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 15:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
* 15:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 14:57 marostegui: Retroactive: started to warm up eqiad databaes
* 14:57 moritzm: installing 4.19.194 kernels on stretch systems with 4.19.x (no reboots yet)
* 14:54 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.2.3 ([[phab:T289802|T289802]])
* 14:53 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 14:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1067.eqiad.wmnet with reason: REIMAGE
* 14:33 moritzm: installing zeromq3 security updates
* 13:50 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15 (duration: 06m 42s)
* 13:44 mbsantos@deploy1002: Started deploy [kartotherian/deploy@eb211ac]: kartotherian: restore v4 maxzoom to z15
* 13:38 brennen: gitlab: upgrading gitlab2001, followed by gitlab1001, to 14.1.5 ([[phab:T289802|T289802]])
* 13:13 brennen: gitlab1001: downtiming alerts for 2.5 hours; upgrading to 14.0.10 ([[phab:T289802|T289802]])
* 12:45 brennen: gitlab: pausing all runners in preparation for upgrade to 14.0.10 ([[phab:T289802|T289802]])
* 11:57 moritzm: installing curl security updates on stretch
* 11:09 jbond: upload statograph_0.1.2
* 11:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 10:06 jelto: upgrade gitlab2001 to gitlab-ce=14.0.10-ce.0
* 10:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 10:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T289802
* 09:38 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to wikimedia.org - [[phab:T210137|T210137]]
* 09:29 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to codfw - [[phab:T210137|T210137]]
* 09:09 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqiad - [[phab:T210137|T210137]]
* 07:45 godog: start rollout of prometheus-rsyslog-exporter 0.0.0+git20201008-3 to eqsin/esams/ulsfo - [[phab:T210137|T210137]]
* 06:46 ryankemper: [WDQS] Manually running puppet-agent on `miscweb2002.codfw.wmnet,miscweb1002.eqiad.wmnet`
* 06:45 ryankemper: [WDQS] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/719185 to rollback query.wikidata.org changes
* 02:59 eileen: civicrm revision changed from {{Gerrit|06ef98593f}} to {{Gerrit|593d01f4fc}}, config revision is {{Gerrit|5f004d94d7}}
* 00:00 legoktm: legoktm@lists1001:~$ sudo rm -rf /etc/mailman # cleanup as part of {{Gerrit|4869d91b0be}} / [[phab:T282303|T282303]]


== 2021-06-04 ==
== 2021-09-07 ==
* 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
* 23:25 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
* 23:20 robh@cumin1001: START - Cookbook sre.dns.netbox
* 20:59 bblack: repool cp1087 - [[phab:T278729|T278729]]
* 23:13 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:719381{{!}}Enable UrlShortener everywhere (T267925)]] (duration: 00m 58s)
* 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 23:07 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Config: [[gerrit:716041{{!}}profiler: use seperate pipeline inside k8s pods (T288165)]] (duration: 00m 58s)
* 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
* 22:29 cstone: SmashPig revision changed from {{Gerrit|afd362b163}} to {{Gerrit|3607b16f83}}
* 19:06 bblack: depool cp1087 - [[phab:T278729|T278729]]
* 20:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715018{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on all wikis (T251480)]] (duration: 00m 59s)
* 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:31 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 20:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:18 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:01 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 16:39 moritzm: installing jetty9 security updates on buster
* 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I434d9cfa29d84f}} (duration: 00m 56s)
* 16:30 dancy@deploy1002: Synchronized README: testing (duration: 00m 59s)
* 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 56s)
* 15:18 akosiaris: run_benchmarky.py against mwdebug.svc.codfw.wmnet for performance tests
* 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: {{Gerrit|Iea41ab8599ffae}} (duration: 00m 59s)
* 15:07 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 15:04 jbond: upload python-prometheus-client_0.6.0 to stretch-wikimedia
* 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref [[phab:T282761|T282761]]
* 14:50 mutante: snapshot1015 - manually removed prometheus-puppet-agent-stats from crontab which was sending spam and is now a timer
* 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 mutante: CI - migrating zuul-merger cronjob to systemd timer (contint*)
* 12:46 marostegui: Upgrade mysql on clouddb1016 [[phab:T283235|T283235]]
* 14:23 XioNoX: re-pool esams-eqiad - [[phab:T288503|T288503]]
* 12:27 marostegui: Upgrade mysql on clouddb1015 [[phab:T283235|T283235]]
* 14:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
* 14:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: REIMAGE
* 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
* 14:22 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 09:47 ema: pool cp1087 [[phab:T278729|T278729]]
* 14:22 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: REIMAGE
* 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 14:17 marostegui: No more db maintenance on eqiad [[phab:T288594|T288594]]
* 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 14:08 mutante: alert1001 - temp disabled puppet, stopped icinga-wm
* 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 14:07 mutante: temp killed icinga-wm because of flooding
* 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 14:01 Emperor: removing pc2010 from orchestrator [[phab:T289117|T289117]]
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
* 13:59 Emperor: removing pc2010 from tendril and zarcillo [[phab:T289117|T289117]]
* 09:06 ema: reboot cp1087 [[phab:T278729|T278729]]
* 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
* 13:57 XioNoX: drain esams-eqiad for circuit maintenance - [[phab:T288503|T288503]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: