You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .)
imported>Stashbot
(ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0))
 
(232 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-02-03 ==
== 2021-10-16 ==
* 00:16 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:13 legoktm@deploy1001: Synchronized logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (2/2) (duration: 01m 05s)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:12 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (1/2) (duration: 01m 10s)
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:10 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .


== 2021-02-02 ==
== 2021-10-15 ==
* 23:53 mutante: mw1300 - scap pull (it crashed earlier put is back after powercycling)
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:52 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:30 mutante: powercycling crashed m1300.eqiad.wmnet
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1335.eqiad.wmnet
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1336.eqiad.wmnet
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 21:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1335.eqiad.wmnet
* 22:34 mutante: apt2001 - upgraded nginx
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1336.eqiad.wmnet
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 21:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 21:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'  # test on cp2027 looks good, perhaps slightly-increased Varnish CPU consumption but hard to be sure
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:00 Lucas_WMDE: Morning backport window done
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:58 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/WikibaseMediaInfo/: Backport: [[gerrit:661092{{!}}Pass $databaseName into WikiPageEntityDataLoader (T273622)]] (duration: 01m 07s)
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/: Backport: [[gerrit:661091{{!}}Add wiki ID to WikiPageEntityDataLoader (T273622)]] (duration: 01m 25s)
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:52 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:00 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:48 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 18:43 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 18:23 milimetric@deploy1001: Finished deploy [analytics/turnilo/deploy@052348b]: (no justification provided) (duration: 00m 03s)
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:23 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 18:22 milimetric@deploy1001: deploy aborted: (no justification provided) (duration: 00m 10s)
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:22 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 18:17 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:07 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:03 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth2001.codfw.wmnet
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth1002.eqiad.wmnet
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth1002.eqiad.wmnet
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth2001.codfw.wmnet
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14135 and previous config saved to /var/cache/conftool/dbconfig/20210202-143950-root.json
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1001.eqiad.wmnet
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid1001.eqiad.wmnet
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 14:35 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2003.codfw.wmnet
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 14:26 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.29 (duration: 73m 10s)
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14134 and previous config saved to /var/cache/conftool/dbconfig/20210202-142446-root.json
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2003.codfw.wmnet
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14133 and previous config saved to /var/cache/conftool/dbconfig/20210202-140943-root.json
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14132 and previous config saved to /var/cache/conftool/dbconfig/20210202-135439-root.json
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:49 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2002.codfw.wmnet
* 06:20 urbanecm: Start server-side upload for 1 video file
* 13:49 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2001.codfw.wmnet
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14128 and previous config saved to /var/cache/conftool/dbconfig/20210202-133936-root.json
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 00:07 brennen: end of UTC late backport & config training window
* 13:32 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 13:31 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1003.eqiad.wmnet
* 13:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2001.codfw.wmnet
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 13:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1002.eqiad.wmnet
* 13:13 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.29
* 13:13 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1003.eqiad.wmnet
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc2001.codfw.wmnet
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host doc1002.eqiad.wmnet
* 13:11 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1002.eqiad.wmnet
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2001.codfw.wmnet
* 13:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid2001.codfw.wmnet
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
* 12:52 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* 12:52 klausman@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ml-etcd1002.eqiad.wmnet
* 12:51 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* 12:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
* 12:50 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on malmok.wikimedia.org with reason: rebooting for kernel update
* 12:47 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
* 12:46 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on cescout1001.eqiad.wmnet with reason: rebooting for kernel update
* 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki2001.codfw.wmnet
* 12:46 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd1002.eqiad.wmnet
* 12:46 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
* 12:44 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* 12:43 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1002.eqiad.wmnet
* 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1001.eqiad.wmnet
* 12:43 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
* 12:42 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
* 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
* 12:41 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd1001.eqiad.wmnet
* 12:41 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
* 12:40 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
* 12:40 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki2001.codfw.wmnet
* 12:39 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
* 12:38 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host pki1001.eqiad.wmnet
* 12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
* 12:37 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
* 12:36 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
* 12:34 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* 12:34 urbanecm@deploy1001: Synchronized docroot/noc/conf/index.php: {{Gerrit|995649efafc2f5a44824af1e96128baaf15ef928}}: noc: yaml files may be published w/o .txt extension (duration: 00m 57s)
* 12:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
* 12:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
* 12:30 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 12:30 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd2001.codfw.wmnet
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
* 12:26 urbanecm@deploy1001: Synchronized docroot/noc/createTxtFileSymlinks.sh: {{Gerrit|210647e915c91a4bddf0407d05436a9e231d3f29}}: noc: Publicly expose logos/config.yaml (2/2; [[phab:T273330|T273330]]) (duration: 00m 55s)
* 12:23 urbanecm@deploy1001: Synchronized docroot/noc/conf/logos-config.yaml: {{Gerrit|210647e915c91a4bddf0407d05436a9e231d3f29}}: noc: Publicly expose logos/config.yaml (1/2; [[phab:T273330|T273330]]) (duration: 00m 57s)
* 12:22 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2001.codfw.wmnet
* 12:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/HomepageModules/Banner.php: {{Gerrit|da8f328640ca5c46385a57e706cd76614bbfdc7a}}: Banner module: Switch to using activated/unactivated for state ([[phab:T273084|T273084]]) (duration: 00m 58s)
* 12:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: {{Gerrit|18c59d018b6ef72c750e25588518d2df6f492db3}}: SpecialHomepage: Do not load start-startediting if SE arent enabled ([[phab:T273243|T273243]]) (duration: 01m 01s)
* 12:18 klausman@cumin1001: START - Cookbook sre.ganeti.makevm for new host ml-etcd1001.eqiad.wmnet
* 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
* 12:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
* 12:14 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
* 12:14 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
* 12:13 jbond42: upload cas_6.3 package
* 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
* 12:12 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 12:11 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
* 11:06 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 11:04 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 10:30 XioNoX: re-enable DE-CIX codfw peering sessions
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 to clone db1174 - [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14121 and previous config saved to /var/cache/conftool/dbconfig/20210202-100859-marostegui.json
* 10:08 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 10:02 hashar: Restarted Gerrit primary on gerrit1001 # [[phab:T273223|T273223]]
* 10:00 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 [[phab:T273223|T273223]] (duration: 00m 09s)
* 10:00 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit primary on gerrit1001 to v3.2.7 [[phab:T273223|T273223]]
* 10:00 hashar: Restarted Gerrit replica on gerrit2001 # [[phab:T273223|T273223]]
* 09:56 hashar@deploy1001: Finished deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 [[phab:T273223|T273223]] (duration: 00m 12s)
* 09:56 hashar@deploy1001: Started deploy [gerrit/gerrit@c3cd63b]: Gerrit replica on gerrit2001 to v3.2.7 [[phab:T273223|T273223]]
* 09:27 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
* 08:56 XioNoX: disable DE-CIX codfw peering session
* 08:30 godog: swift eqiad-prod: add weight back to sdg on ms-be1054 - [[phab:T273582|T273582]]
* 08:02 legoktm: depooled mw1381.eqiad.wmnet for perf testing ([[phab:T273312|T273312]])
* 07:59 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
* 07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
* 07:45 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14118 and previous config saved to /var/cache/conftool/dbconfig/20210202-073105-root.json
* 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14117 and previous config saved to /var/cache/conftool/dbconfig/20210202-071602-root.json
* 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14116 and previous config saved to /var/cache/conftool/dbconfig/20210202-070057-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14115 and previous config saved to /var/cache/conftool/dbconfig/20210202-064553-root.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repool es1022 after a restart', diff saved to https://phabricator.wikimedia.org/P14114 and previous config saved to /var/cache/conftool/dbconfig/20210202-063050-root.json
* 06:24 marostegui: Restart mysql on es1022
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1022 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14113 and previous config saved to /var/cache/conftool/dbconfig/20210202-062303-marostegui.json
* 04:12 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 03:40 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:40 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@ad9db35]: 0.3.62 (duration: 06m 59s)
* 03:29 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.62` on canary `wdqs1003`; proceeding to rest of fleet
* 03:29 ryankemper@deploy1001: Started deploy [wdqs/wdqs@ad9db35]: 0.3.62
* 03:26 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.62`. Pre-deploy tests passing on canary `wdqs1003`
* 03:21 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1006`


== 2021-02-01 ==
== 2021-10-14 ==
* 23:54 legoktm@deploy1001: Synchronized wmf-config/profiler.php: profiler: Send data to excimer-buster pipeline ([[phab:T273312|T273312]]) (duration: 00m 57s)
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:15 legoktm: depooling mw1403 and mw1405 for perf testing
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 23:14 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 23:05 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 22:31 mutante: depooling mw1452 for testig
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 18:41 urbanecm: UTC evening B&C done
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 17:42 rzl: depool mw1452 for training
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 16:33 moritzm: installing node-ansi-regex security updates
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=
* 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:53 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14075 and previous config saved to /var/cache/conftool/dbconfig/20210201-084531-root.json
* 08:51 Emperor: removing pc1008 from orchestrator [[phab:T289119|T289119]]
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1089 from dbctl [[phab:T273417|T273417]]', diff saved to https://phabricator.wikimedia.org/P14074 and previous config saved to /var/cache/conftool/dbconfig/20210201-084523-marostegui.json
* 08:44 Emperor: removing pc1008 from tendril and zarcillo [[phab:T289119|T289119]]
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 4%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14073 and previous config saved to /var/cache/conftool/dbconfig/20210201-084211-root.json
* 08:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1008.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 7%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14072 and previous config saved to /var/cache/conftool/dbconfig/20210201-082933-root.json
* 08:31 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1008.eqiad.wmnet
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 2%: Slowly pooling db1166 for the first time', diff saved to https://phabricator.wikimedia.org/P14071 and previous config saved to /var/cache/conftool/dbconfig/20210201-082707-root.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17241 and previous config saved to /var/cache/conftool/dbconfig/20210907-082952-marostegui.json
* 08:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1166 with minimal weight for the first time [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14070 and previous config saved to /var/cache/conftool/dbconfig/20210201-081554-marostegui.json
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 5%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14069 and previous config saved to /var/cache/conftool/dbconfig/20210201-081429-root.json
* 08:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1166 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14068 and previous config saved to /var/cache/conftool/dbconfig/20210201-080520-marostegui.json
* 08:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 3%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14067 and previous config saved to /var/cache/conftool/dbconfig/20210201-075926-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 100%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17240 and previous config saved to /var/cache/conftool/dbconfig/20210907-080230-root.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 2%: Slowly pooling db1175 for the first time', diff saved to https://phabricator.wikimedia.org/P14066 and previous config saved to /var/cache/conftool/dbconfig/20210201-074422-root.json
* 07:52 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 100%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17239 and previous config saved to /var/cache/conftool/dbconfig/20210907-075235-kormat.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 with some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14065 and previous config saved to /var/cache/conftool/dbconfig/20210201-073603-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'More weight for db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17238 and previous config saved to /var/cache/conftool/dbconfig/20210907-074901-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14064 and previous config saved to /var/cache/conftool/dbconfig/20210201-070429-root.json
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 75%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17237 and previous config saved to /var/cache/conftool/dbconfig/20210907-074726-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14063 and previous config saved to /var/cache/conftool/dbconfig/20210201-064926-root.json
* 07:37 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 75%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17236 and previous config saved to /var/cache/conftool/dbconfig/20210907-073731-kormat.json
* 06:39 marostegui: Run analyze table on db2071 and db2102
* 07:37 godog: +100G for prometheus/k8s codfw
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14062 and previous config saved to /var/cache/conftool/dbconfig/20210201-063422-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Start to pool db2090 into API [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17235 and previous config saved to /var/cache/conftool/dbconfig/20210907-073436-marostegui.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1175 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14061 and previous config saved to /var/cache/conftool/dbconfig/20210201-062358-marostegui.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 50%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17234 and previous config saved to /var/cache/conftool/dbconfig/20210907-073222-root.json
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 25%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14060 and previous config saved to /var/cache/conftool/dbconfig/20210201-061919-root.json
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 50%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17233 and previous config saved to /var/cache/conftool/dbconfig/20210907-072227-kormat.json
* 06:10 marostegui: Upgrade db2071 and db2102 to 10.4.18
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 25%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17232 and previous config saved to /var/cache/conftool/dbconfig/20210907-071719-root.json
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 10%: After fixing replication', diff saved to https://phabricator.wikimedia.org/P14059 and previous config saved to /var/cache/conftool/dbconfig/20210201-060415-root.json
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P14058 and previous config saved to /var/cache/conftool/dbconfig/20210201-055851-marostegui.json
* 07:13 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster (now with fixed pool config) [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17231 and previous config saved to /var/cache/conftool/dbconfig/20210907-070724-kormat.json
* 07:07 kormat@cumin1001: dbctl commit (dc=all): 'Fixing db2118's pooling config [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17230 and previous config saved to /var/cache/conftool/dbconfig/20210907-070702-kormat.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 10%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17229 and previous config saved to /var/cache/conftool/dbconfig/20210907-070215-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2090 (re)pooling @ 5%: Slowly repool [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17228 and previous config saved to /var/cache/conftool/dbconfig/20210907-064711-root.json
* 05:15 marostegui: Optimize eowiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:15 marostegui: Optimize vecwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:14 marostegui: Optimize kawiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
 
== 2021-09-06 ==
* 23:52 tstarling@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/SecurePoll/includes/Talliers/STVTallier.php: [[phab:T290000|T290000]] (duration: 00m 58s)
* 16:14 Amir1: Deployed patch for [[phab:T290394|T290394]]
* 15:01 Emperor: removing pc1007 from orchestrator [[phab:T289118|T289118]]
* 15:00 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 kormat@cumin1001: dbctl commit (dc=all): 'db2118 (re)pooling @ 25%: reimage to buster [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17226 and previous config saved to /var/cache/conftool/dbconfig/20210906-145341-kormat.json
* 14:50 Emperor: removing pc1007 from tendril and zarcillo [[phab:T289118|T289118]]
* 14:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pc1007.eqiad.wmnet
* 14:45 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1026.eqiad.wmnet
* 14:44 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1026.eqiad.wmnet
* 14:36 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 14:35 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc1007.eqiad.wmnet
* 14:22 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 14:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part II (duration: 00m 57s)
* 14:17 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:715492{{!}}Set permission of creating short url to everyone everywhere (T267921 T267925)]], Part I (duration: 00m 59s)
* 14:12 moritzm: installing postgres 9.6 security updates
* 14:05 gehel: re-pooling wdqs1007, catched up on lag
* 13:56 jbond: update facter networking fact gerrit:715949
* 13:51 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:719118{{!}}ProductionServices: fix comment for rdb* servers]] (duration: 00m 58s)
* 13:42 moritzm: updated thirdparty/gitlab component to 14.0.10 [[phab:T284811|T284811]]
* 13:04 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:42 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:41 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 12:40 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 12:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:06 godog: silence statograph until thurs on alert1001 - [[phab:T290425|T290425]]
* 11:58 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=plwiki 'editor' 'editeditorprotected' # [[phab:T230103|T230103]]
* 11:56 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=<nowiki>{</nowiki>hewiki,lvwiki,srwiki,srwikibooks<nowiki>}</nowiki> 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 11:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' # [[phab:T230103|T230103]]
* 11:50 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=dewiktionary 'autoreviewprotected' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:48 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript renameRestrictions.php --wiki=arwiki 'autoreview' 'editautoreviewprotected' # [[phab:T230103|T230103]]
* 11:07 urbanecm: EU B&C window done
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8d7cf8f7c3faaf3773940e96ba0cf599e725237}}: foundationwiki: Create editor group ([[phab:T205352|T205352]]) (duration: 00m 57s)
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f90862be8c7b540065da24c24f2e2ac0df5b9d07}}: Growth: Define wgGEMentorDashboardDiscoveryEnabled ([[phab:T289054|T289054]]) (duration: 00m 58s)
* 11:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.21/maintenance/renameRestrictions.php: {{Gerrit|18e43ecca7d25d2d93de2f98f3bf5b36f5d4b780}}: renameRestrictions.php: Update protected_titles as well ([[phab:T290398|T290398]]) (duration: 00m 59s)
* 10:39 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts mc1027.eqiad.wmnet
* 10:38 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 10:22 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 10:17 volans@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 09:22 gehel: depooling wdqs1007, catching up on lag
* 09:06 gehel: restart blazegraph and updater on wdqs1007
* 08:46 jbond: update networking fact - gerrit:715943
* 07:57 godog: fail sdw on ms-be1062, reported errors
* 07:51 moritzm: installing libssh security updates
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:44 moritzm: installing squashfs-tools security updates
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 06:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 06:28 marostegui: Optimize table mkwiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:26 marostegui: Optimize table bewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 06:23 marostegui: Optimize table dewiki.flaggedtemplates in eqiad [[phab:T290057|T290057]]
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: REIMAGE
* 05:07 marostegui: Stop replication on db2090 (old s4 master) [[phab:T289650|T289650]] [[phab:T288803|T288803]]
* 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 (current master) from API [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17223 and previous config saved to /var/cache/conftool/dbconfig/20210906-050502-marostegui.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2090 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17222 and previous config saved to /var/cache/conftool/dbconfig/20210906-050419-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2110 to s4 primary and set section read-write [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17221 and previous config saved to /var/cache/conftool/dbconfig/20210906-050140-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17220 and previous config saved to /var/cache/conftool/dbconfig/20210906-050048-root.json
* 05:00 marostegui: Starting s4 codfw failover from db2090 to db2110 - [[phab:T289650|T289650]]
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2110 with weight 0 [[phab:T289650|T289650]]', diff saved to https://phabricator.wikimedia.org/P17219 and previous config saved to /var/cache/conftool/dbconfig/20210906-040740-root.json
* 04:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]
* 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 33 hosts with reason: Primary switchover s4 [[phab:T289650|T289650]]
 
== 2021-09-05 ==
* 18:54 urbanecm: wikiadmin@10.192.0.119(ptwiki)> update protected_titles set pt_create_perm='editautoreviewprotected' where pt_create_perm='autoreviewer'; # [[phab:T290396|T290396]]
 
== 2021-09-04 ==
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 100%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17217 and previous config saved to /var/cache/conftool/dbconfig/20210904-133532-root.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 75%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17216 and previous config saved to /var/cache/conftool/dbconfig/20210904-132029-root.json
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 50%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17215 and previous config saved to /var/cache/conftool/dbconfig/20210904-130525-root.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 25%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17214 and previous config saved to /var/cache/conftool/dbconfig/20210904-125021-root.json
* 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 10%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17213 and previous config saved to /var/cache/conftool/dbconfig/20210904-123518-root.json
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2137:3314 (re)pooling @ 5%: Slowly repool [[phab:T290374|T290374]]', diff saved to https://phabricator.wikimedia.org/P17212 and previous config saved to /var/cache/conftool/dbconfig/20210904-122014-root.json
* 09:04 elukey: restart wmf_auto_restart_rsyslog.service on puppetdb1002
* 09:00 elukey: `systemctl reset-failed ifup@ens6.service` on puppetdb2002 - [[phab:T273026|T273026]]
* 03:02 rzl@cumin2001: dbctl commit (dc=all): 'Depool db2137:3314', diff saved to https://phabricator.wikimedia.org/P17210 and previous config saved to /var/cache/conftool/dbconfig/20210904-030231-rzl.json
 
== 2021-09-03 ==
* 21:49 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:30 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:33 krinkle@deploy1002: Finished deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}} (duration: 00m 10s)
* 19:33 krinkle@deploy1002: Started deploy [integration/docroot@6492b3d]: {{Gerrit|I48480e89e5f6}}
* 19:26 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:04 ryankemper: [[phab:T290330|T290330]] `ryankemper@cumin1001:~$ sudo -E cumin 'P<nowiki>{</nowiki>wdqs2*<nowiki>}</nowiki>' 'sudo rm -fv /etc/cron.hourly/restart-blazegraph'` (Cleaned up manually created crons now that we have [somewhat hacky] systemd timers doing the same job)
* 17:42 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:40 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:35 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 17:17 ryankemper: [[phab:T290330|T290330]] Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/717508 across `wdqs` fleet; codfw wdqs hosts will restart on average once per hour now to address ongoing availability issues for wdqs codfw
* 16:32 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 16:10 gehel: blazegraph (public cofdfw cluster) will now restart every hour - [[phab:T290330|T290330]]
* 15:53 jbond: enable puppet fleet wide to post puppetdb database maintance - [[phab:T263578|T263578]]
* 15:21 jbond: create lvm snapshot puppetdb2002_data_snapshot on ganeti2023 - [[phab:T263578|T263578]]
* 15:17 jbond: create lvm snapshot puppetdb1002_data_snapshot on ganeti1012 - [[phab:T263578|T263578]]
* 15:00 jbond: disable puppet fleet wide to preform puppetdb database maintance - [[phab:T263578|T263578]]
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:20 mutante: mw2264 - scap pull
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:18 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:11 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1027.eqiad.wmnet
* 13:10 dcausse: installing openjdk-8-dbg on wdqs2007
* 13:04 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1027.eqiad.wmnet
* 13:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1023.eqiad.wmnet
* 12:48 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1023.eqiad.wmnet
* 12:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1035-1036].eqiad.wmnet
* 12:32 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1035-1036].eqiad.wmnet
* 12:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc[1028-1032].eqiad.wmnet
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d] (duration: 00m 06s)
* 12:03 joal@deploy1002: Started deploy [analytics/refinery@7208d3d] (thin): Analytics hotfix deploy (bis) THIN [analytics/refinery@7208d3d]
* 12:03 joal@deploy1002: Finished deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d] (duration: 19m 16s)
* 11:56 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 19m 21s)
* 11:44 joal@deploy1002: Started deploy [analytics/refinery@7208d3d]: Analytics hotfix deploy (bis)[analytics/refinery@7208d3d]
* 11:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from enwiki - [[phab:T289050|T289050]]
* 11:37 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 11:36 dcausse@deploy1002: Finished deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA (duration: 01m 07s)
* 11:35 dcausse@deploy1002: Started deploy [wdqs/wdqs@8361ac9]: ban queries from a generic UA
* 10:58 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1028-1032].eqiad.wmnet
* 10:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc[1025-1026].eqiad.wmnet
* 10:47 joal@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures (duration: 00m 32s)
* 10:46 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): Deploy latest code on AQS new servers - test after failures
* 10:45 joal@deploy1002: deploy aborted: Deploy latest code on AQS new servers - test after failures (duration: 00m 05s)
* 10:45 joal@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-test): Deploy latest code on AQS new servers - test after failures
* 10:29 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 03s)
* 10:29 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:22 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 55s)
* 10:21 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:17 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 10:16 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:08 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 45s)
* 10:08 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:05 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 36s)
* 10:04 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:02 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 25s)
* 10:01 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 10:00 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 01m 53s)
* 09:58 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 09:57 hnowlan@deploy1002: Finished deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts (duration: 00m 09s)
* 09:57 hnowlan@deploy1002: Started deploy [analytics/aqs/deploy@d273fde] (aqs-next): deploying aqs to inactive aqs-next hosts
* 09:32 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979] (duration: 00m 07s)
* 09:32 joal@deploy1002: Started deploy [analytics/refinery@4ff8979] (thin): Analytics hotfix deploy THIN [analytics/refinery@4ff8979]
* 09:26 joal@deploy1002: Finished deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979] (duration: 17m 36s)
* 09:25 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc[1025-1026].eqiad.wmnet
* 09:15 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1022.eqiad.wmnet
* 09:13 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:09 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:09 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:09 joal@deploy1002: Started deploy [analytics/refinery@4ff8979]: Analytics hotfix deploy [analytics/refinery@4ff8979]
* 09:08 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:06 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 09:03 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:53 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:52 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:45 ema: cp-eqsin: clean apt cache to free up some space [[phab:T290305|T290305]]
* 08:45 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1022.eqiad.wmnet
* 08:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 07:43 legoktm: uploaded pygments 2.10.0+dfsg-1~wmf1 to apt.wm.o in component/pygments
* 07:42 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from severak s3 wikis - [[phab:T289050|T289050]]
* 07:10 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:57 marostegui@cumin1001: START - Cookbook sre.dns.netbox
* 06:45 elukey: run `apt-get clean` on cp5012 to free some space (94% of the root partition used)
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17203 and previous config saved to /var/cache/conftool/dbconfig/20210903-061204-root.json
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17202 and previous config saved to /var/cache/conftool/dbconfig/20210903-061138-root.json
* 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17201 and previous config saved to /var/cache/conftool/dbconfig/20210903-055700-root.json
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17200 and previous config saved to /var/cache/conftool/dbconfig/20210903-055635-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17199 and previous config saved to /var/cache/conftool/dbconfig/20210903-054157-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17198 and previous config saved to /var/cache/conftool/dbconfig/20210903-054131-root.json
* 05:30 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts pc2007.codfw.wmnet
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17196 and previous config saved to /var/cache/conftool/dbconfig/20210903-052653-root.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17195 and previous config saved to /var/cache/conftool/dbconfig/20210903-052628-root.json
* 05:20 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts pc2007.codfw.wmnet
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3314 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17194 and previous config saved to /var/cache/conftool/dbconfig/20210903-051149-root.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2138:3312 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17193 and previous config saved to /var/cache/conftool/dbconfig/20210903-051124-root.json
* 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2138 for upgrade', diff saved to https://phabricator.wikimedia.org/P17192 and previous config saved to /var/cache/conftool/dbconfig/20210903-050423-marostegui.json
* 00:31 tgr@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: Backport: [[gerrit:716491{{!}}fixLinkRecommendationData: Try harder to avoid >10K result sets (T284531)]] (duration: 00m 58s)
* 00:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
 
== 2021-09-02 ==
* 23:12 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704171{{!}}Adding wordmark for ptwikinews mobile and desktop skins (T281591)]] Part II (duration: 00m 57s)
* 23:11 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikinews-wordmark-pt.svg: Config: [[gerrit:704171{{!}}Adding wordmark for ptwikinews mobile and desktop skins (T281591)]] Part I (duration: 01m 14s)
* 21:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:37 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:17 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:57 ejegg: updated fundraising CiviCRM from {{Gerrit|7ac13753c7}} to {{Gerrit|06ef98593f}}
* 19:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mc1021.eqiad.wmnet
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:40 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1021.eqiad.wmnet
* 19:28 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.21  refs [[phab:T281162|T281162]]
* 18:31 ryankemper: [WCQS] `wcqs100[1-3],wcqs200[1-3]` downtimed until `2021-09-09 20:29:55` (UTC)
* 18:28 ryankemper: [WCQS] Merged & deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/713946, going to suppress icinga alerts on `wcqs*` hosts because these are still in the process of being spun up properly and aren't serving traffic or anything
* 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:24 ryankemper@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 18:20 ryankemper@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:57 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:18 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:09 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1020.eqiad.wmnet
* 15:53 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1020.eqiad.wmnet
* 15:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1019.eqiad.wmnet
* 15:31 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:28 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 15:26 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1019.eqiad.wmnet
* 15:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts mc1033.eqiad.wmnet
* 15:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mc1034.eqiad.wmnet
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17178 and previous config saved to /var/cache/conftool/dbconfig/20210902-150412-root.json
* 14:50 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1034.eqiad.wmnet
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17177 and previous config saved to /var/cache/conftool/dbconfig/20210902-144908-root.json
* 14:49 jiji@cumin1001: START - Cookbook sre.hosts.decommission for hosts mc1033.eqiad.wmnet
* 14:47 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:38 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:35 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17176 and previous config saved to /var/cache/conftool/dbconfig/20210902-143405-root.json
* 14:33 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:32 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 14:22 moritzm: installing exiv2 security updates
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17175 and previous config saved to /var/cache/conftool/dbconfig/20210902-141901-root.json
* 14:13 moritzm: installing ffmpeg security updates
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17174 and previous config saved to /var/cache/conftool/dbconfig/20210902-140357-root.json
* 14:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:57 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 for upgrade', diff saved to https://phabricator.wikimedia.org/P17173 and previous config saved to /var/cache/conftool/dbconfig/20210902-134838-marostegui.json
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17172 and previous config saved to /var/cache/conftool/dbconfig/20210902-134448-root.json
* 13:42 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:42 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:41 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:39 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:39 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 13:38 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 13:36 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:35 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17171 and previous config saved to /var/cache/conftool/dbconfig/20210902-132945-root.json
* 13:29 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 13:24 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 13:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 13:14 jbond: reimage sretest1002 (not sretest1001)
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17169 and previous config saved to /var/cache/conftool/dbconfig/20210902-131441-root.json
* 13:14 jbond: reimage sretest1001
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17168 and previous config saved to /var/cache/conftool/dbconfig/20210902-125937-root.json
* 12:55 jbond: disable puppet fleet wide to roll out 715728
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2119 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17167 and previous config saved to /var/cache/conftool/dbconfig/20210902-124434-root.json
* 12:42 marostegui: Upgrade db2119
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17166 and previous config saved to /var/cache/conftool/dbconfig/20210902-124102-marostegui.json
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17165 and previous config saved to /var/cache/conftool/dbconfig/20210902-122826-root.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17164 and previous config saved to /var/cache/conftool/dbconfig/20210902-121323-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17163 and previous config saved to /var/cache/conftool/dbconfig/20210902-115819-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17162 and previous config saved to /var/cache/conftool/dbconfig/20210902-114315-root.json
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17161 and previous config saved to /var/cache/conftool/dbconfig/20210902-112812-root.json
* 11:26 urbanecm@deploy1002: Synchronized README: testing scap (duration: 01m 06s)
* 11:22 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2264.codfw.wmnet
* 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2106 for upgrade', diff saved to https://phabricator.wikimedia.org/P17160 and previous config saved to /var/cache/conftool/dbconfig/20210902-111843-marostegui.json
* 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3ce5d80eb6f8ad720b5d9c0b6ad7840dd869735e}}: dewiki: Enable Growth features for 30% of newcomers ([[phab:T288420|T288420]]) (duration: 01m 58s)
* 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:04 urbanecm: metawiki: Server-side page move from VRT -> Volunteer Response Team ([[phab:T290083|T290083]])
* 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17158 and previous config saved to /var/cache/conftool/dbconfig/20210902-110022-root.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17155 and previous config saved to /var/cache/conftool/dbconfig/20210902-104518-root.json
* 10:38 mbsantos: REINDEX database gis in maps1009 while it's in depooled state
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17152 and previous config saved to /var/cache/conftool/dbconfig/20210902-103014-root.json
* 10:24 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:23 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:19 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17150 and previous config saved to /var/cache/conftool/dbconfig/20210902-101511-root.json
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2073 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17147 and previous config saved to /var/cache/conftool/dbconfig/20210902-100007-root.json
* 09:57 marostegui: Upgrade db2073
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2073 for upgrade', diff saved to https://phabricator.wikimedia.org/P17145 and previous config saved to /var/cache/conftool/dbconfig/20210902-095601-marostegui.json
* 09:56 hashar@deploy1002: Finished deploy [integration/docroot@973ac8a]: Support listing files on index pages - [[phab:T289196|T289196]] (duration: 00m 07s)
* 09:55 hashar@deploy1002: Started deploy [integration/docroot@973ac8a]: Support listing files on index pages - [[phab:T289196|T289196]]
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17142 and previous config saved to /var/cache/conftool/dbconfig/20210902-092026-root.json
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17141 and previous config saved to /var/cache/conftool/dbconfig/20210902-090523-root.json
* 08:55 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats from eowiki,idwiki,plwiki,trwiki - [[phab:T289050|T289050]]
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17140 and previous config saved to /var/cache/conftool/dbconfig/20210902-085019-root.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17138 and previous config saved to /var/cache/conftool/dbconfig/20210902-083515-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17136 and previous config saved to /var/cache/conftool/dbconfig/20210902-082012-root.json
* 08:14 marostegui: Upgrade db2140
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2140 for upgrade', diff saved to https://phabricator.wikimedia.org/P17135 and previous config saved to /var/cache/conftool/dbconfig/20210902-081436-marostegui.json
* 07:57 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
* 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on huwiki - [[phab:T289050|T289050]]
* 07:44 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on arwiki - [[phab:T289050|T289050]]
* 07:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:00 marostegui: Stop mariadb on pc2007 before decommissioning [[phab:T289112|T289112]]
* 06:59 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Remove pc2007 [[phab:T289112|T289112]] (duration: 01m 06s)
* 06:13 eileen: civicrm revision changed from {{Gerrit|ad37f21a7d}} to {{Gerrit|7ac13753c7}}, config revision is {{Gerrit|5f004d94d7}}
* 04:50 marostegui: Remove flaggedrevs_stats2 and flaggedrevs_stats on ruwiki - [[phab:T289050|T289050]]
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:05 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/extensions/WikimediaMaintenance/blameStartupRegistry.php: {{Gerrit|I63bf1922af593b7a144ef5f6d036f9a5e23cec09}} (duration: 01m 07s)
 
== 2021-09-01 ==
* 23:50 Amir1: mwscript createAndPromote.php --wiki=test2wiki --sysop --force Ladsgroup
* 23:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|0bd65426494d4df981141650211e27e17c98ee0c}}: fixLinkRecommendationData: stay under 10K search limit ([[phab:T284531|T284531]]) (duration: 01m 06s)
* 23:27 eileen: civicrm revision changed from {{Gerrit|30cd9c1d90}} to {{Gerrit|ad37f21a7d}}, config revision is {{Gerrit|5f004d94d7}}
* 23:25 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: {{Gerrit|3c7d4ecc699b7c68467a372686f5514375d2b74f}}: fixLinkRecommendationData: Allow --db-table in dry-run mode ([[phab:T283868|T283868]]) (duration: 01m 06s)
* 23:20 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 3/3) (duration: 01m 05s)
* 23:19 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 2/3) (duration: 01m 06s)
* 23:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91ff9273fd9f80b571771a7454d34d63f43405b8}}: Enable NearbyPages on beta cluster ([[phab:T246493|T246493]]; 1/3) (duration: 01m 06s)
* 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb7d92c48edf48b94fd628e9e0b5fd6682460373}}: Enable WVUI search on Wikimedia Commons ([[phab:T287215|T287215]]) (duration: 01m 07s)
* 23:04 dpifke@deploy1002: Finished deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts [[phab:T281243|T281243]] (duration: 00m 06s)
* 23:04 dpifke@deploy1002: Started deploy [performance/navtiming@63c9d31]: Deploy fix for CpuBenchmark-related Prometheus timeouts [[phab:T281243|T281243]]
* 22:44 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:43 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:42 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:42 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:40 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:39 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 22:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 22:33 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:33 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:32 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:32 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:30 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:29 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:57 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]] (duration: 01m 06s)
* 19:57 twentyafterfour: twentyafterfour@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281162|T281162]]
* 19:56 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]]
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe1ae2e438841a069dc8dadc9a1850b91863c06a}}: Growth features: Deploy to 100% of newcomers on small wikis ([[phab:T289786|T289786]]) (duration: 01m 06s)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|27e85b1f228dccb584b4692f5b1b1354b19625b4}}: nlwiki: Enable link recommendations for all Growth users ([[phab:T285254|T285254]]) (duration: 01m 06s)
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|94b1cca}}: Growth features: Enable for newcomers on two wikis ([[phab:T285254|T285254]], [[phab:T287867|T287867]]) (duration: 01m 09s)
* 17:31 ejegg: updated payments-wiki from {{Gerrit|c4d56178d0}} to {{Gerrit|f9cbf95a12}}
* 16:23 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071] (duration: 00m 06s)
* 16:23 mforns@deploy1002: Started deploy [analytics/refinery@ff15071] (thin): Fix for cassandra3 loading THIN [analytics/refinery@ff15071]
* 16:22 mforns@deploy1002: Finished deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071] (duration: 26m 58s)
* 16:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
* 16:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
* 16:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1066.eqiad.wmnet with reason: REIMAGE
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1065.eqiad.wmnet with reason: REIMAGE
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1064.eqiad.wmnet with reason: REIMAGE
* 15:55 mforns@deploy1002: Started deploy [analytics/refinery@ff15071]: Fix for cassandra3 loading [analytics/refinery@ff15071]
* 15:35 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:04 godog: move simone-this-dot from wmf to nda ldap group - [[phab:T289783|T289783]]
* 13:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.20/includes/resourceloader: {{Gerrit|Id7c258841d7816}} (duration: 01m 06s)
* 13:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.21/includes/resourceloader: {{Gerrit|Id7c258841d7816}} (duration: 01m 49s)
* 13:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 13:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:16 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 13:05 mutante: planet1002 - temp removing feed from ad.huikeshoven - seems to cause corrupt state file ([[phab:T289984|T289984]])
* 13:01 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 12:48 godog: s/webperf/navtiming/
* 12:47 godog: bounce webperf on webperf2001 - [[phab:T290138|T290138]]
* 12:41 mutante: planet1002 - rm /etc/rawdog/en/feeds/39a7970f.state (corrupt) [[phab:T289984|T289984]]
* 12:38 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 11:19 Krinkle: effie restarted php-fpm on parse2007.codfw.wmnet, ref [[phab:T290120|T290120]].
* 10:21 jbond: start filtering more puppet facts G:715461 - [[phab:T263578|T263578]]
* 09:23 marostegui: Drop flaggedrevs_stats and flaggedrevs_stats2 from dewiki [[phab:T289050|T289050]]
* 07:45 ema: deploy Varnish SLO dashboard with grr apply slo_dashboards.jsonnet [[phab:T289036|T289036]]
* 07:05 XioNoX: pfw NAT and ACLs changes - [[phab:T290077|T290077]]
* 06:29 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
* 06:28 elukey@cumin1001: START - Cookbook sre.puppet.renew-cert for sodium.wikimedia.org: Renew puppet certificate - elukey@cumin1001
* 05:25 effie: depool mw2251 mw2255 parse2001 for tests - [[phab:T280497|T280497]]
* 04:41 marostegui: Optimize idwiki.flaggedtemplates [[phab:T290057|T290057]]
* 04:23 marostegui: Optimize arwiki.flaggedtemplates [[phab:T290057|T290057]]
* 04:16 eileen: civicrm revision changed from {{Gerrit|7da3eba4f9}} to {{Gerrit|30cd9c1d90}}, config revision is {{Gerrit|5f004d94d7}}
* 00:53 eileen: civicrm revision changed from {{Gerrit|e567b4c289}} to {{Gerrit|7da3eba4f9}}, config revision is {{Gerrit|5f004d94d7}}
 
== 2021-08-31 ==
* 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 eileen: civicrm revision changed from {{Gerrit|718aa9cad3}} to {{Gerrit|e567b4c289}}, config revision is {{Gerrit|7a24870bc7}}
* 23:33 dpifke@deploy1002: Synchronized wmf-config/profiler.php: Revert excimer-k8s pipelines [[phab:T288165|T288165]] (duration: 01m 14s)
* 23:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 dpifke@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 23:23 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 mforns: failed deployment of refinery (v0.1.17) to an-test-coord1001.eqiad.wmnet (scap error)
* 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b] (duration: 13m 42s)
* 23:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1437d99c1884c0695f02b81b724ec82a2bd3362e}}: Enable link recommendation frontent in dewiki and nlwiki ([[phab:T288420|T288420]], [[phab:T285254|T285254]]) (duration: 01m 06s)
* 23:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8997ae5d0b998839853aed2b246f5c88fe9d83eb}}: Fix wgDiscussionTools_sourcemodetoolbar settings (duration: 01m 22s)
* 23:01 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (hadoop-test): Regular analytics weekly train TEST v0.1.17 [analytics/refinery@a0f039b]
* 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b] (duration: 00m 07s)
* 23:00 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b] (thin): Regular analytics weekly train THIN v0.1.17 [analytics/refinery@a0f039b]
* 23:00 mforns@deploy1002: Finished deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b] (duration: 17m 39s)
* 22:42 mforns@deploy1002: Started deploy [analytics/refinery@a0f039b]: Regular analytics weekly train v0.1.17 [analytics/refinery@a0f039b]
* 21:58 ejegg: switched Adyen to new Checkout integration
* 21:41 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:38 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:34 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]]
* 19:20 brennen: gitlab1001: brief downtime for testing reconfiguration of cas3.session_duration
* 19:05 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]] (duration: 35m 53s)
* 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:40 ejegg: switched Adyen back to HPP integration
* 18:38 ejegg: updated payments-wiki from {{Gerrit|564daed816}} to {{Gerrit|c4d56178d0}}, switched Adyen to Checkout integration
* 18:30 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.21  refs [[phab:T281161|T281161]]
* 18:24 twentyafterfour: ran `scap prep 1.37.0-wmf.21` and `scap apply-patches --train 1.37.0-wmf.21` refs [[phab:T281162|T281162]]
* 18:05 XioNoX: re-pool eqsin-codfw link
* 16:18 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:14 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:08 hnowlan@deploy1002: Finished deploy [restbase/deploy@09156c2]: fix core Title redirect loop (duration: 16m 02s)
* 15:52 hnowlan@deploy1002: Started deploy [restbase/deploy@09156c2]: fix core Title redirect loop
* 14:30 jbond: enable puppet fleet wide to post preform puppetdb maintance [[phab:T263578|T263578]]
* 14:29 hashar: Restarting CI Jenkins for plugins upgrade
* 14:19 ottomata: merged change to service_auto_restart.pp that changes the way service names are matched to be more explicit.  tested in deployment prep and nothing bad happened.  Logging in case something bad does happen in prod.  https://gerrit.wikimedia.org/r/c/operations/puppet/+/697605
* 14:09 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:09 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:05 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:05 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:03 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:02 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on puppetdb1002.eqiad.wmnet with reason: puppetdb maintance - [[phab:T289779|T289779]]
* 14:01 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:00 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:47 jbond: disable puppet fleet wide to preform puppetdb maintance [[phab:T263578|T263578]]
* 13:41 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:37 urbanecm: Start `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=nlwiki --verbose` in a tmux session at mwmaint2002
* 13:28 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 13:06 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 13:04 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:59 urbanecm: [urbanecm@mwmaint2002 ~]$ sudo -u www-data kill 133282 # stop updateMenteeData.php at frwiki
* 12:52 jelto: run kubectl scale deployments.apps -n ci mediawiki-bruce --replicas=0 to stop ImagePulling and reduce io on kubestage1001
* 12:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:38 jbond: sudo  gnt-instance modify --disk add:size=100G  puppetdb2002.codfw.wmnet [[phab:T263578|T263578]]
* 11:38 jbond: sudo gnt-instance modify --disk add:size=100G puppetdb1002.eqiad.wmnet [[phab:T263578|T263578]]
* 11:37 jbond: sudo  gnt-instance modify --disk add:size=100G  puppetdb2002.codfw.wmnet
* 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|53a1856128edb4ec3a5ea8840fb6755a1703f7ac}}: updateMenteeData: Send timing to statsd ([[phab:T278971|T278971]]) (duration: 00m 57s)
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 urbanecm: EU B&C window done
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eb482e3fa88a87166b990fd9b87d0ccbbf971290}}: Offer the DiscussionTools reply tool as opt-out setting at 21 phase 2 Wikipedias ([[phab:T288483|T288483]]) (duration: 00m 57s)
* 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1010.eqiad.wmnet with reason: Resyncing from master
* 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 10:14 marostegui: Optimize huwiki.flaggedtemplates [[phab:T290057|T290057]]
* 10:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 08:39 marostegui: Optimize plwiki.flaggedtemplates [[phab:T290057|T290057]]
* 08:18 marostegui: Optimize cewiki.flaggedtemplates [[phab:T290057|T290057]]
* 08:05 marostegui: Optimize plwiktionary.flaggedtemplates [[phab:T290057|T290057]]
* 07:44 marostegui: Optimize ruwiki.flaggedtemplates [[phab:T290057|T290057]]
* 07:01 XioNoX: drain eqsin-codfw link
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17113 and previous config saved to /var/cache/conftool/dbconfig/20210831-065600-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17112 and previous config saved to /var/cache/conftool/dbconfig/20210831-064056-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17111 and previous config saved to /var/cache/conftool/dbconfig/20210831-062553-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 25%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17110 and previous config saved to /var/cache/conftool/dbconfig/20210831-061049-root.json
* 06:06 marostegui: Rename flaggedrevs_stats2 and flaggedrevs_stats on dewiki codfw [[phab:T289050|T289050]]
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2110 (re)pooling @ 10%: Slowly repool after reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17109 and previous config saved to /var/cache/conftool/dbconfig/20210831-055546-root.json
* 03:39 eileen: civicrm revision changed from {{Gerrit|e89504652a}} to {{Gerrit|718aa9cad3}}, config revision is {{Gerrit|cb0a008cad}}
* 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 eileen: tools revision changed from {{Gerrit|14e4125f73}} to {{Gerrit|1d67c52c12}}


== 2021-01-29 ==
== 2021-08-30 ==
* 23:26 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:36 dancy@deploy1001: Finished scap: MW servers complaining about l10n files after .27 rollback (duration: 07m 22s)
* 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 dancy@deploy1001: Started scap: MW servers complaining about l10n files after .27 rollback
* 23:11 urbanecm: Evening B&C done
* 22:26 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 23:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: {{Gerrit|9e2264a0c9a48548da4795b2a5b9d7275d254ac7}}: Instrument Special:MentorDashboard ([[phab:T289369|T289369]]) (duration: 00m 55s)
* 22:20 reedy@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: CacheTime: Extra protection for rollback unserialization [[phab:T273007|T273007]] (duration: 01m 00s)
* 23:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/includes/Specials/SpecialHomepage.php: {{Gerrit|9e2264a0c9a48548da4795b2a5b9d7275d254ac7}}: Instrument Special:MentorDashboard ([[phab:T289369|T289369]]) (duration: 00m 57s)
* 22:14 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 21:56 eileen: civicrm revision changed from {{Gerrit|13bf3a02df}} to {{Gerrit|e89504652a}}, config revision is {{Gerrit|cb0a008cad}}
* 22:09 dancy@deploy1001: scap failed: average error rate on 8/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:42 razzi: rebalance kafka partitions for codfw.resource_change
* 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:40 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 19:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9a92e2ae7526717a0a42b825a34b4595e75a544b}}: Fix mediawiki.mentor_dashboard.visits definition (duration: 00m 56s)
* 19:26 razzi@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 19:08 tgr: morning deploys done for real
* 19:26 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 19:06 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715579{{!}}Fix schema definition for mediawiki.mentor_dashboard.visit (T289369)]] (duration: 00m 56s)
* 18:50 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes
* 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:31 reedy@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.config.js: [[phab:T273231|T273231]] (duration: 01m 02s)
* 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 effie: depool mw1403 and mw1405
* 18:49 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: [[gerrit:715529{{!}}Add mediawiki.mentor_dashboard.visit schema (T289369)]] (duration: 00m 26s)
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-presto1001.eqiad.wmnet
* 18:48 tgr@deploy1002: Scap failed!: 5/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-presto1001.eqiad.wmnet
* 18:43 tgr: morning deploys done
* 14:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 18:43 tgr@deploy1002: scap failed: average error rate on 3/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 14:56 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 18:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:22 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:715568{{!}}GrowthExperiments: Enable link recommendation for dewiki and nlwiki (T288420 T285254)]] (duration: 00m 56s)
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 18:14 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714548{{!}}GrowthExperiments: Switch image recommendations flag off (T288797)]] (duration: 00m 57s)
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:44 ryankemper: [WDQS Deploy] Test query passing on `query.wikidata.org` and icinga looks good. This deploy is done.
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:12 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 17:12 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:10 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@a17833c]: 0.3.84 (duration: 08m 16s)
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 17:04 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.84` on canary `wdqs1003`; proceeding to rest of fleet
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 17:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@a17833c]: 0.3.84
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:02 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.84`. Pre-deploy tests passing on canary `wdqs1003`
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 17:00 ryankemper: [[phab:T289483|T289483]] Pooled `wdqs1013`
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 12:38 hnowlan: uploaded osmborder_0.1.0-2~buster0 package to buster-wikimedia
* 16:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
* 12:00 gilles@deploy1001: Finished deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events (duration: 00m 06s)
* 16:20 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Resyncing from master
* 12:00 gilles@deploy1001: Started deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 11:42 dcausse@deploy1001: Synchronized wmf-config/unitConversionConfig.json: [[phab:T270252|T270252]]: Update unitConversionConfig.json (duration: 01m 01s)
* 16:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1007.eqiad.wmnet
* 11:39 gilles@deploy1001: Finished deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check (duration: 00m 05s)
* 16:16 sukhe: running authdns-update for Gerrit 715499
* 11:39 gilles@deploy1001: Started deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check
* 14:44 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 11:26 gilles@deploy1001: Finished deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field (duration: 00m 06s)
* 14:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 11:26 gilles@deploy1001: Started deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field
* 14:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from master
* 11:06 gilles@deploy1001: Finished deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events (duration: 00m 05s)
* 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
* 11:06 gilles@deploy1001: Started deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events
* 14:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 11:04 elukey: upload presto-* version 0.246-1 packages to buster/stretch-wikimedia
* 14:18 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:45 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14050 and previous config saved to /var/cache/conftool/dbconfig/20210129-103505-root.json
* 13:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b17015395cc592e021a4ca8ce6f81b699bb77381}}: Growth mentor dashboard: Enable beta features only on beta wikis ([[phab:T280307|T280307]]) (duration: 00m 55s)
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14049 and previous config saved to /var/cache/conftool/dbconfig/20210129-102001-root.json
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f1a178e1d4d7c98a1988da68982f97848f390c68}}: knwiki: Disable wmgNewUserMessageOnAutoCreate ([[phab:T289333|T289333]]) (duration: 00m 57s)
* 10:18 vgutierrez: pool cp5006
* 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14048 and previous config saved to /var/cache/conftool/dbconfig/20210129-100458-root.json
* 13:48 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|6fbcc93f429ff3fbca98aeecdee4f33f022ca7c3}}: Add missing edit*protected rights to $wgAvailableRights (duration: 00m 56s)
* 09:51 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 12:12 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --wiki=jvwikisource --backend=local-multiwrite ([[phab:T289860|T289860]])
* 09:50 vgutierrez: reboot cp5006
* 11:52 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14047 and previous config saved to /var/cache/conftool/dbconfig/20210129-094954-root.json
* 11:51 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14046 and previous config saved to /var/cache/conftool/dbconfig/20210129-093451-root.json
* 11:48 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:32 marostegui: Expand lvs on db1155-db1175 [[phab:T258361|T258361]]
* 11:47 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:31 vgutierrez: depool cp5006
* 11:31 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:20 marostegui: Change buffer pool sizes on clouddb1013,1015,1017,1019 [[phab:T267090|T267090]]
* 11:30 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 07:11 marostegui: Upgrade pc2007 to 10.4.18 [[phab:T268457|T268457]]
* 10:55 jelto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175', diff saved to https://phabricator.wikimedia.org/P14044 and previous config saved to /var/cache/conftool/dbconfig/20210129-065529-marostegui.json
* 10:53 jelto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 03:35 marostegui: Reload haproxy1018
* 10:21 dcausse@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
* 09:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2252.codfw.wmnet
* 09:34 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703476{{!}}Set $wgIncludejQueryMigrate to false in group0 (T280944)]] (duration: 00m 57s)
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2251.codfw.wmnet
* 09:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 02:04 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|If0c71a983772c}} (duration: 00m 58s)
* 09:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps1006.eqiad.wmnet with reason: Resyncing from master
* 01:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 09:01 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 01:48 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 09:00 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 08:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 08:57 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 08:57 godog: +100G to prometheus/global in codfw
* 01:07 mutante: repooled mw2248,mw2249 - jobrunners/videoscalers now on buster
* 08:04 vgutierrez: pool cp2027 - [[phab:T289908|T289908]]
* 01:06 mutante: repooled mw2048,mw2049 - jobrunners/videoscalers now on buster
* 06:53 elukey: drop an-airflow1001's old airflow logs to fix root partition almost filled up
* 01:06 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 06:38 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
* 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
* 01:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
* 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: REIMAGE
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2249.codfw.wmnet
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2110 for reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17105 and previous config saved to /var/cache/conftool/dbconfig/20210830-052336-marostegui.json
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
* 00:19 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2261.codfw.wmnet
* 00:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2262.codfw.wmnet
* 00:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet


== 2021-01-28 ==
== 2021-08-29 ==
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2261.codfw.wmnet
* 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2262.codfw.wmnet
* 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:57 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2283.codfw.wmnet
* 23:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 23:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 23:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 23:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 23:34 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 23:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 23:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 23:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 23:14 mutante: reimaging jobrunners/videoscallers mw2248,mw2249
* 22:43 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: [[gerrit:658688{{!}}CacheTime: Extra protection for rollback unserialization (T273007)]] (duration: 00m 57s)
* 22:41 bblack: eqiad lvs should be back to normal state now with everything working
* 22:39 bblack: lvs1014 - apply https://gerrit.wikimedia.org/r/659439
* 22:37 bblack: lvs1013 - testing https://gerrit.wikimedia.org/r/659439 (expect nop, worked on 1015!)
* 22:36 bblack: lvs1015 - testing https://gerrit.wikimedia.org/r/659439 (expect nop)
* 22:21 bblack: lvs1016 - trying https://gerrit.wikimedia.org/r/659439 on backup LVS...
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2287.codfw.wmnet
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2286.codfw.wmnet
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2285.codfw.wmnet
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2284.codfw.wmnet
* 22:16 bblack: disabling puppet on all eqiad lvs for https://gerrit.wikimedia.org/r/659439 risks
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2284.codfw.wmnet
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2285.codfw.wmnet
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2286.codfw.wmnet
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2287.codfw.wmnet
* 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 21:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 21:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 21:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 21:19 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 05s)
* 21:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
* 21:15 brennen: 1.36.0-wmf.28 train status ([[phab:T271342|T271342]]): blockers resolved, going go group1 to be follow shortly by all wikis
* 21:11 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CentralAuth/includes/: Backport: [[gerrit:659362{{!}}Revert CentralAuthCreateLocalAccountJob changes in 9f79de4 (T273205)]] (duration: 01m 09s)
* 20:49 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/tests/phpunit/includes/parser/ParserOptionsTest.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 07s)
* 20:46 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/parser/ParserOptions.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 08s)
* 20:25 bblack: lvs1014,lvs1016 - all back to "normal" state
* 20:24 bblack: lvs1014 - restart pybal
* 20:20 bblack: lvs1016 - restart pybal
* 20:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables (duration: 01m 44s)
* 20:13 bblack: lvs1014,lvs1016 - puppet temporarily disabled for new service config deploy - [[phab:T271476|T271476]]
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2223.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 20:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables
* 20:13 mutante: scap pulling and repooling: mw1264, mw2223, mw2247
* 20:11 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1019.eqiad.wmnet
* 20:10 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1018.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 19:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier (duration: 01m 09s)
* 19:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier
* 19:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --add-prefix=BROKEN --fix ([[phab:T271939|T271939]])
* 19:44 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --fix ([[phab:T271939|T271939]])
* 19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ae49093893316657ffd7cf56669a470fb073352}}: frwikisource: Add WS as an alias to NS_PROJECT ([[phab:T271939|T271939]]) (duration: 00m 57s)
* 19:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fd18092fd8b73414f6c320895601c83b883e29ee}}: Add image.laji.fi to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T270587|T270587]]) (duration: 01m 04s)
* 19:36 jynus: extending backup1001 /dev/mapper/array1-archive partition to allocate enough space for helium backups [[phab:T238048|T238048]]
* 19:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|519350b86bd4afc8d4efc3c2f9b2631a0ced22c2}}: frwiktionary: Change babel category names per community request ([[phab:T270186|T270186]]) (duration: 00m 59s)
* 19:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3d0ca3a11a59063e5adfc126702032ea357e8524}}: Create patroller user group for thwiki ([[phab:T272149|T272149]]) (duration: 01m 07s)
* 19:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:19 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 00m 08s)
* 19:19 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 19:15 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 16m 53s)
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e914f1e65adfdf2f41af97363501b0ba3c40d5b8}}: robots: cawikimedia: Set wgDefaultRobotPolicy to noindex,nofollow ([[phab:T272871|T272871]]) (duration: 01m 08s)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:10 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables (duration: 01m 25s)
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:07 cdanis: decom Zayo IP transit on cr2-codfw [[phab:T272675|T272675]]
* 19:06 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for mediawiki_revision_recommendation_create (duration: 01m 12s)
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 18:58 cdanis: draining traffic from Zayo OGYX/123447 codfw<>ulsfo in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 18:58 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 18:58 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove [[phab:T257687|T257687]] mitigations (duration: 01m 10s)
* 18:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:34 mutante: reimaging another canary appserver, mw1264, so that we will have at least 2 stretch and 2 buster canaries for the transitional period
* 18:30 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 17:49 jgleeson: fundraising-tools tools updated from {{Gerrit|41cab089da}} to {{Gerrit|d64b2f8cee}}
* 17:38 crusnov@deploy1001: Finished deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]] (duration: 01m 18s)
* 17:37 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:35 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:28 ebernhardson: ban elastic1063 from production-search-omega-eqiad and production-search-eqiad [[phab:T265113|T265113]]
* 17:11 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 06s)
* 16:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 arturo: running homer on cr*-eqiad* again for reverting latest changes ([[phab:T271476|T271476]])
* 16:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:26 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:24 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:24 akosiaris: stop scraping apertium from prometheus, it doesn't have a prometheus endpoint.
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:17 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:03 arturo: running homer on cr*-eqiad* for [[phab:T271476|T271476]]
* 15:55 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:54 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:52 cdanis: draining traffic from Zayo OGYX/120003 codfw<>eqiad in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 15:49 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days (duration: 01m 15s)
* 15:49 marostegui: Power off clouddb1019 for memory replacement [[phab:T272125|T272125]]
* 15:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NavigationTiming schemas to Event Platform on all wikis - [[phab:T271208|T271208]] (duration: 01m 11s)
* 15:06 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:05 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:26 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:14 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148 after kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14039 and previous config saved to /var/cache/conftool/dbconfig/20210128-141425-marostegui.json
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14038 and previous config saved to /var/cache/conftool/dbconfig/20210128-135730-marostegui.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14037 and previous config saved to /var/cache/conftool/dbconfig/20210128-135612-root.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14036 and previous config saved to /var/cache/conftool/dbconfig/20210128-135602-root.json
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14035 and previous config saved to /var/cache/conftool/dbconfig/20210128-134109-root.json
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14034 and previous config saved to /var/cache/conftool/dbconfig/20210128-134057-root.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14033 and previous config saved to /var/cache/conftool/dbconfig/20210128-132605-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14032 and previous config saved to /var/cache/conftool/dbconfig/20210128-132553-root.json
* 13:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14031 and previous config saved to /var/cache/conftool/dbconfig/20210128-131101-root.json
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14030 and previous config saved to /var/cache/conftool/dbconfig/20210128-131050-root.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1024's weight', diff saved to https://phabricator.wikimedia.org/P14029 and previous config saved to /var/cache/conftool/dbconfig/20210128-125631-marostegui.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14028 and previous config saved to /var/cache/conftool/dbconfig/20210128-125558-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14027 and previous config saved to /var/cache/conftool/dbconfig/20210128-125546-root.json
* 12:48 dcausse: European mid-day backport window done
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14026 and previous config saved to /var/cache/conftool/dbconfig/20210128-123800-root.json
* 12:32 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 01m 09s)
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 80%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14025 and previous config saved to /var/cache/conftool/dbconfig/20210128-122256-root.json
* 12:22 marostegui: Reboot db1146:3312 db1146:3314
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312, db1146:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14024 and previous config saved to /var/cache/conftool/dbconfig/20210128-122118-marostegui.json
* 12:12 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T271493|T271493]]: [cirrus] set 50kb limit on file text indexing for commons (duration: 01m 09s)
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 70%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14023 and previous config saved to /var/cache/conftool/dbconfig/20210128-120752-root.json
* 12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T266027|T266027]]: [cirrus] Swith to perfield builder for spaceless languages (duration: 01m 06s)
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14022 and previous config saved to /var/cache/conftool/dbconfig/20210128-115249-root.json
* 11:45 gilles@deploy1001: Finished deploy [performance/navtiming@446e5df]: (no justification provided) (duration: 00m 05s)
* 11:45 gilles@deploy1001: Started deploy [performance/navtiming@446e5df]: (no justification provided)
* 11:37 vgutierrez: upgrade pybal to 1.15.9 in esams
* 11:30 elukey: disable nginx proxy buffering on archiva.wikimedia.org for a perf test - [[phab:T252767|T252767]]
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 30%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14020 and previous config saved to /var/cache/conftool/dbconfig/20210128-112242-root.json
* 11:21 vgutierrez: upgrade pybal to 1.15.9 in eqiad
* 11:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14019 and previous config saved to /var/cache/conftool/dbconfig/20210128-110739-root.json
* 11:04 marostegui: Restart mysql on es1025  [[phab:T266483|T266483]]
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14018 and previous config saved to /var/cache/conftool/dbconfig/20210128-110353-marostegui.json
* 11:01 _joe_: restarting php-fpm on the appserver,api and jobrunner clusters in eqiad, 10% at a time, for simulating scap rolling restarts [[phab:T266055|T266055]]
* 10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es5 on writes [[phab:T266483|T266483]] (duration: 01m 05s)
* 10:46 marostegui: Restart mysql on es1024  [[phab:T266483|T266483]]
* 10:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es5 from writes [[phab:T266483|T266483]] (duration: 01m 09s)
* 10:33 _joe_: performing a test-run of the rolling restart of php-fpm in codfw, using the same code scap will use [[phab:T266055|T266055]]. Starting from the api cluster, then proceeding whith others
* 10:15 _joe_: upgrading pybal on lvs2008
* 10:11 _joe_: upgrading pybal on lvs2009
* 10:10 vgutierrez: upgrade pybal to 1.15.9 in eqsin
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14017 and previous config saved to /var/cache/conftool/dbconfig/20210128-095642-root.json
* 09:48 _joe_: upgrading pybal to 1.15.9 in codfw, starting from lvs2010
* 09:47 jbond42: upload new cas package to apt
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 80%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14016 and previous config saved to /var/cache/conftool/dbconfig/20210128-094139-root.json
* 09:30 _joe_: upgrading pybal on lvs4006
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 70%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14015 and previous config saved to /var/cache/conftool/dbconfig/20210128-092635-root.json
* 09:25 _joe_: upgrading pybal on lvs4005
* 09:11 _joe_: installing pybal 1.15.9 on lvs4007
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14014 and previous config saved to /var/cache/conftool/dbconfig/20210128-091131-root.json
* 09:08 moritzm: installing perf updates on Stretch
* 09:06 marostegui: Testing wikitech
* 09:00 _joe_: uploading pybal 1.15.9 to apt.wikimedia.org
* 08:58 moritzm: installing perf updates on Buster
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14013 and previous config saved to /var/cache/conftool/dbconfig/20210128-085627-root.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14012 and previous config saved to /var/cache/conftool/dbconfig/20210128-084123-root.json
* 08:34 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14011 and previous config saved to /var/cache/conftool/dbconfig/20210128-083347-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14010 and previous config saved to /var/cache/conftool/dbconfig/20210128-083337-root.json
* 08:32 vgutierrez: pool cp1087 - [[phab:T273153|T273153]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 30%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14009 and previous config saved to /var/cache/conftool/dbconfig/20210128-082620-root.json
* 08:20 vgutierrez: restart purged on cp1087 - [[phab:T273153|T273153]]
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14008 and previous config saved to /var/cache/conftool/dbconfig/20210128-081843-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14007 and previous config saved to /var/cache/conftool/dbconfig/20210128-081834-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14006 and previous config saved to /var/cache/conftool/dbconfig/20210128-081116-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14005 and previous config saved to /var/cache/conftool/dbconfig/20210128-080340-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14004 and previous config saved to /var/cache/conftool/dbconfig/20210128-080330-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14003 and previous config saved to /var/cache/conftool/dbconfig/20210128-075613-root.json
* 07:54 moritzm: installing tomcat9 security updates
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14002 and previous config saved to /var/cache/conftool/dbconfig/20210128-074836-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14001 and previous config saved to /var/cache/conftool/dbconfig/20210128-074827-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14000 and previous config saved to /var/cache/conftool/dbconfig/20210128-073426-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13999 and previous config saved to /var/cache/conftool/dbconfig/20210128-073333-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13998 and previous config saved to /var/cache/conftool/dbconfig/20210128-073323-root.json
* 07:25 elukey: powercycle cp1087 (after depooling it)
* 07:24 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13997 and previous config saved to /var/cache/conftool/dbconfig/20210128-072154-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13996 and previous config saved to /var/cache/conftool/dbconfig/20210128-072120-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13995 and previous config saved to /var/cache/conftool/dbconfig/20210128-072036-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to s1 for the first time, with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13994 and previous config saved to /var/cache/conftool/dbconfig/20210128-063806-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13993 and previous config saved to /var/cache/conftool/dbconfig/20210128-063655-marostegui.json
* 03:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 03:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2291.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2290.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2291.codfw.wmnet
* 02:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:33 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 01:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2294.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2293.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2292.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2294.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2293.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2292.codfw.wmnet
* 00:50 Urbanecm: Evening B&C done
* 00:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87c304c5439b1b7898f951db61d0a0a8a11ee4f7}}: Disable max-width on page namespace for wikisource ([[phab:T260091|T260091]]; 2nd take) (duration: 01m 00s)
* 00:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
* 00:41 foks: reset email for User:Uwe Martens
* 00:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
* 00:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.wmnet
* 00:33 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/includes/: {{Gerrit|c5c39ba8b3fce3f946e161191b814446aa5c1f4b}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 04s)
* 00:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:31 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: {{Gerrit|a67fe4f7cbf172b82153aaceaa93a067cdff2ae4}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 07s)
* 00:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:26 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/BaseModule.php: {{Gerrit|5417e0c8518b54144b99c963a1bbff3d15a00b32}}: Fix BaseModule::BASE_CSS_CLASS visibility ([[phab:T273099|T273099]]) (duration: 01m 00s)
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:12 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty


== 2021-01-27 ==
== 2021-08-28 ==
* 23:30 shdubsh: reboot logstash2006
* 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
* 23:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
* 09:12 elukey: powercycle cp2027 - OEM event registered in racadm getsel, no tty, no ssh
* 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2222.codfw.wmnet
* 09:11 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2027.codfw.wmnet
* 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
* 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
* 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
* 21:57 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 21:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday (duration: 02m 23s)
* 21:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday
* 21:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task (duration: 07m 54s)
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task
* 21:09 ebernhardson@deploy1001: deploy aborted: airflow: hourly tasks must wait for yesterdays daily tank (duration: 00m 00s)
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily tank
* 20:58 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/libs/objectcache/RedisBagOStuff.php: Backport: [[gerrit:658780{{!}}objectcache: fix broken for loop in RedisBagOStuff::doSetMulti() (T273006)]] (duration: 01m 07s)
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 20:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2299.codfw.wmnet
* 20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2217.codfw.wmnet
* 20:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
* 20:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2221.codfw.wmnet
* 20:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 20:30 brennen: 1.36.0-wmf.28 ([[phab:T271342|T271342]]): taking over train while dancy is afk; waiting on [[gerrit:658939]] to merge and will sync for verification on testwikis
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2216.codfw.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2218.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2219.codfw.wmnet
* 20:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
* 20:07 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 05s)
* 20:06 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 06s)
* 20:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|53419ab6c0f2c306a68edb8979106bd42536211a}}: arwiki: Configure wgGEHomepageManualAssignmentMentorsList ([[phab:T273060|T273060]]) (duration: 00m 59s)
* 19:19 elukey: reboot an-launcher1002 for kernel upgrades
* 19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cabb2e2009f97bb86c1b8827c3cc61cc991c41a9}}: Declare 6 more NavigationTiming eventlogging streams and migrate on testwiki ([[phab:T271208|T271208]]) (duration: 01m 00s)
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9382a9879bd6823fd664c0d3721fd0a9dc0d56d8}}: Migrate WebUIActionsTracking schemas to Event Platform on all wikis ([[phab:T267347|T267347]],[[phab:T271164|T271164]]) (duration: 01m 03s)
* 19:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2215.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet
* 18:50 mutante: testreduce1001 - making nginx listen on IPv6 and restarting it [[phab:T266509|T266509]]
* 18:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 18:30 Tchanders: Creating the table securepoll_log in votewiki and testwiki ([[phab:T271270|T271270]])
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 07s)
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 10s)
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 18:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 18:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 18:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002 (duration: 00m 05s)
* 18:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002
* 18:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2301.codfw.wmnet
* 18:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1406.eqiad.wmnet
* 18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 18:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1407.eqiad.wmnet
* 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 18:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1407.eqiad.wmnet
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2301.codfw.wmnet
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1406.eqiad.wmnet
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 17:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 17:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 17:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 17:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 17:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 17:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 16:54 elukey@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 16:40 elukey@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 16:38 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 16:21 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 16:18 moritzm: installing python-bottle security updates
* 15:42 elukey: umount /var/hadoop/data/r on an-worker1099 and restart hadoop daemons - [[phab:T273034|T273034]]
* 15:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on group0 and group1 - [[phab:T271208|T271208]] (duration: 01m 07s)
* 15:15 godog: bounce rsyslog on centrallog1001
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:43 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:25 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13989 and previous config saved to /var/cache/conftool/dbconfig/20210127-123300-root.json
* 12:25 awight: EU bacon done
* 12:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658594{{!}}Enable bracket matching on the first wikis (T270238)]] (duration: 01m 07s)
* 12:20 awight@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CodeMirror: Backport: [[gerrit:658814{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 06s)
* 12:19 awight@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CodeMirror: Backport: [[gerrit:658815{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 14s)
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13988 and previous config saved to /var/cache/conftool/dbconfig/20210127-121756-root.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13987 and previous config saved to /var/cache/conftool/dbconfig/20210127-120253-root.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13986 and previous config saved to /var/cache/conftool/dbconfig/20210127-114749-root.json
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13985 and previous config saved to /var/cache/conftool/dbconfig/20210127-113245-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13984 and previous config saved to /var/cache/conftool/dbconfig/20210127-105735-marostegui.json
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with final weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13982 and previous config saved to /var/cache/conftool/dbconfig/20210127-102042-marostegui.json
* 10:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
* 10:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
* 10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
* 10:05 elukey: reboot matomo1002 for kernel upgrades
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13981 and previous config saved to /var/cache/conftool/dbconfig/20210127-100220-marostegui.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13980 and previous config saved to /var/cache/conftool/dbconfig/20210127-093802-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13979 and previous config saved to /var/cache/conftool/dbconfig/20210127-091909-marostegui.json
* 09:04 jbond42: deploy fix to enable-puppet
* 09:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13978 and previous config saved to /var/cache/conftool/dbconfig/20210127-083618-marostegui.json
* 08:29 marostegui: Stop mysql on db1089 to clone db1169 [[phab:T258361|T258361]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 to clone db1169 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13976 and previous config saved to /var/cache/conftool/dbconfig/20210127-082826-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13975 and previous config saved to /var/cache/conftool/dbconfig/20210127-081150-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13974 and previous config saved to /var/cache/conftool/dbconfig/20210127-080753-marostegui.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13973 and previous config saved to /var/cache/conftool/dbconfig/20210127-080645-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13972 and previous config saved to /var/cache/conftool/dbconfig/20210127-075715-marostegui.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13971 and previous config saved to /var/cache/conftool/dbconfig/20210127-075142-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13970 and previous config saved to /var/cache/conftool/dbconfig/20210127-073638-root.json
* 07:26 elukey: powercycle analytics1073 - kernel soft lock up bug registered, os needs a reboot
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13969 and previous config saved to /var/cache/conftool/dbconfig/20210127-072135-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13968 and previous config saved to /var/cache/conftool/dbconfig/20210127-070502-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13967 and previous config saved to /var/cache/conftool/dbconfig/20210127-065715-marostegui.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13966 and previous config saved to /var/cache/conftool/dbconfig/20210127-063930-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13965 and previous config saved to /var/cache/conftool/dbconfig/20210127-061336-marostegui.json
* 06:03 twentyafterfour: phabricator appears to be up and running fine
* 06:03 twentyafterfour: phabricator is read-write
* 06:01 twentyafterfour: phabricator is read-only
* 06:00 marostegui: m3 master restart, phabricator will go on read only - [[phab:T272596|T272596]]
* 05:50 marostegui: Deploy schema change on s3 [[phab:T270055|T270055]]
* 03:48 ryankemper: (Restarted `wdqs-blazegraph` on `wdqs1012`)
* 02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021 (duration: 02m 59s)
* 02:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021
* 01:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 01:56 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@6c6b2cb]: 0.3.61 (duration: 07m 50s)
* 01:50 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.61` on canary `wdqs1003`; proceeding to rest of fleet
* 01:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@6c6b2cb]: 0.3.61
* 01:48 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.61`. Pre-deploy tests passing on canary `wdqs1003`
* 01:39 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup (duration: 01m 11s)
* 01:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2296.codfw.wmnet
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2295.codfw.wmnet
* 01:24 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Roll-out complete. Will monitor `wdqs-internal` for any issues. All the remaining `WDQS SPARQL` alerts should clear shortly
* 01:21 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Test queries to `wdqs1003.eqiad.wmnet` passed, and metrics in Grafana (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=1611706751381&to=1611710190405) look good. Rolling out to rest of fleet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2296.codfw.wmnet
* 01:20 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2295.codfw.wmnet
* 01:14 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps (duration: 03m 31s)
* 01:10 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:51 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Fixed typo in private key in commit `ea152df802b55e939d34494a4965ed83a80a24f2`. Puppet run on `wdqs1003` was successful as a result. Monitoring...
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Discovered source of the above failure; the secret key in the puppetmaster `/srv/private` repo has a typo in its name (my error): it had `wqds` instead of `wdqs`. Opening up a patch now
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:36 ryankemper: [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:20 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Disabled puppet on all `wdqs-internal` hosts; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657913
* 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Downtimed all `wdqs-internal` hosts on icinga
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal


== 2021-01-26 ==
== 2021-08-27 ==
* 23:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2297.codfw.wmnet
* 16:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2298.codfw.wmnet
* 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2302.codfw.wmnet
* 14:50 akosiaris: stop flink on staging cluster to verify some IOPS starvation issues
* 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 14:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2297.codfw.wmnet
* 14:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2298.codfw.wmnet
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2299.codfw.wmnet
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2302.codfw.wmnet
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 22:35 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly (duration: 01m 07s)
* 14:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 22:34 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly
* 14:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 22:30 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2300.codfw.wmnet
* 14:37 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 22:27 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2300.codfw.wmnet
* 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 13:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
* 12:49 mutante: rsynced /srv/org/wikimedia/racktables from miscweb1002 to miscweb2002 ([[phab:T269746|T269746]])
* 22:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
* 12:04 topranks: removing peering to Wave Division Holdings / AS11404 at Equinix Chicago cr2-eqord, AS no longer on exchange.
* 22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
* 10:56 akosiaris: sudo cumin 'mw*' 'ip ro ls dev docker0 && sysctl net.ipv4.ip_forward=0' to clear up the docker remnants of the dragonfly evaluation. [[phab:T286054|T286054]]
* 22:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
* 10:31 godog: bounce logstash on logstash1007
* 22:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
* 10:22 elukey: fallback codfw ores to rdb2007 after maintenance
* 22:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
* 10:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 21:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
* 10:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 21:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
* 09:49 elukey: restart ores uwsgi/celery workers to failover rdb2007 to rdb2008 (and ease the reboot of rdb2007
* 21:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2306.codfw.wmnet
* 09:33 topranks: Running homer against mr1-ulsfo to force OOB interface to 100Mb/full-duplex - [[phab:T288343|T288343]]
* 21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2304.codfw.wmnet
* 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 21:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
* 09:25 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2304.codfw.wmnet
* 09:23 cmooney@deploy1002: Finished deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]] (duration: 01m 28s)
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2306.codfw.wmnet
* 09:21 cmooney@deploy1002: Started deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]]
* 21:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
* 08:05 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
* 07:49 jayme: stopped kube-apiserver on kubestagemaster2001 for testing
* 21:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet
* 07:49 jayme: stopped kube-apiserver on kubestage2001 for testing
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
* 07:00 godog: bounce logstash on logstash1008
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw13388.eqiad.wmnet
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
* 06:41 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 21:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table (duration: 03m 09s)
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2308.codfw.wmnet
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2308.codfw.wmnet
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table
* 00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates ([[phab:T289825|T289825]]) (duration: 01m 06s)
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
* 21:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
* 21:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
* 21:06 ebernhardson: restart airflow-scheduler and airflow-webserver on an-airflow1001 post-deploy
* 21:05 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations (duration: 08m 30s)
* 20:57 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations
* 20:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 20:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on testwiki - [[phab:T271208|T271208]] (duration: 01m 17s)
* 20:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
* 20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 20:52 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Beginning decommission of `relforge1002`: `sudo -i cookbook sre.hosts.decommission relforge1002.eqiad.wmnet -t [[phab:T272444|T272444]]`
* 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
* 20:50 dancy: group0 rolled back to 1.36.0-wmf.27
* 20:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
* 20:50 dancy@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 20:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 20:42 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 20:40 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Beginning decommission of `relforge1001`: `sudo -i cookbook sre.hosts.decommission relforge1001.eqiad.wmnet -t [[phab:T272444|T272444]]`
* 20:40 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 20:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 20:37 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657453 prior to running decom cookbook
* 20:36 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Downtimed `relforge100[1,2]` in Icinga cookbook for the next 26 hours
* 20:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 20:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet
* 20:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
* 20:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1409.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1408.eqiad.wmnet
* 19:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1408.eqiad.wmnet
* 19:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2317.codfw.wmnet
* 19:49 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2317.codfw.wmnet
* 19:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1409.eqiad.wmnet
* 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
* 19:18 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
* 19:16 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
* 19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
* 18:58 moritzm: installing sudo security updates on Jessie
* 18:57 moritzm: uploaded sudo 1.8.10p3-1+deb8u7+wmf1 to apt.wikimedia.org
* 18:46 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.28 (duration: 40m 09s)
* 18:37 moritzm: installing sudo security updates on Stretch
* 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
* 18:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
* 18:15 moritzm: installing sudo security updates on Buster
* 18:07 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.28
* 17:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 17:19 mutante: ms-be1028 - running puppet to clear ferm icinga alert
* 17:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
* 17:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 17:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 16:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 16:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 16:50 marostegui: Deploy schema change on testwiki - [[phab:T272953|T272953]]
* 16:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:42 mutante: reimaginge l33t jobrunner mw1337
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:02 moritzm: installing mutt security updates on buster
* 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 14:44 hnowlan: reimaging maps1009 as new buster master
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:22 akosiaris: restart pybal on lvs1015, lvs1016, lvs2009, lvs2010 for picking up linkrecommendation, similar-users, apertium-tls LVS services.
* 14:21 marostegui: Install mariadb 10.4.18 on pc2010 - [[phab:T268457|T268457]]
* 14:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:07 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:05 marostegui: Restart db1077
* 14:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:41 arturo: admin update some kubernetes-related packages in buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T263284|T263284]])
* 13:30 hashar: Upgraded and restarting Jenkins on release1002 / releases2002 / contint1001 and contint2001
* 12:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=zhwiki --fix # [[phab:T271612|T271612]] # P13960
* 12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11cfef4f05612771d6a7cbe27f9bb1fbb41e0e5d}}: Add WikiProject and WikiProject_talk namespace and its aliases for zh.wikipedia ([[phab:T271612|T271612]]) (duration: 01m 01s)
* 12:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|080389dbac5bb2cddab7640071e43674a868e945}}: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage ([[phab:T272776|T272776]]; 2/2) (duration: 01m 02s)
* 12:24 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikivoyage-wordmark-tr.svg: {{Gerrit|080389dbac5bb2cddab7640071e43674a868e945}}: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage ([[phab:T272776|T272776]]; 1/2) (duration: 01m 01s)
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dfc28a4a759050726561da861a9e1030b529d3e}}: Add Turkish Powered by MediaWiki and A Wikimedia project icons for Turkish Wikivoyage ([[phab:T272781|T272781]]) (duration: 01m 00s)
* 12:12 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=trwikivoyage --cluster=all
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eab535fcc983d57dd36c41309162ace8aadcae1a}}: Add namespace aliases to Turkish Wikivoyage ([[phab:T272782|T272782]]) (duration: 01m 00s)
* 11:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:46 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 11:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:29 moritzm: imported jenkins 2.263.3 to apt.wikimedia.org (thirdparty/ci)
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 09:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 09:37 elukey: reboot dbstore1005 for kernel upgrades
* 09:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Resync: Some mw2xxx hosts have old version (duration: 00m 55s)
* 09:32 godog: disable mdadm check emails on ms-be1022 / known, and host is going to be decom'd - [[phab:T267870|T267870]]
* 09:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T272957|T272957]]
* 09:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T272957|T272957]]
* 09:28 elukey: reboot dbstore1003 for kernel upgrades
* 09:24 urbanecm@deploy1001: Synchronized wmf-config/logos.php: Resyncing to fix mw2xxx apache loading (duration: 00m 57s)
* 09:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 09:14 elukey: reboot dbstore1004 for kernel upgrades
* 09:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eab87780}}: frwiki: Fix tagline height and width ([[phab:T272907|T272907]]) (duration: 00m 58s)
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 (db1175 isn't ready yet)', diff saved to https://phabricator.wikimedia.org/P13959 and previous config saved to /var/cache/conftool/dbconfig/20210126-091236-marostegui.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13958 and previous config saved to /var/cache/conftool/dbconfig/20210126-091149-marostegui.json
* 09:06 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:53 marostegui: Stop mysql on db1081 to clone db1160
* 08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 08:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1119,1131].eqiad.wmnet
* 08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 08:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1119,1131].eqiad.wmnet
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 08:31 godog: swift start decom for ms-be20[17,19,21,23,24,25,26,27] - [[phab:T272837|T272837]]
* 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 08:18 moritzm: upgrading OpenJDK on aqs and Hadoop systems
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 (s4 old master) - [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13955 and previous config saved to /var/cache/conftool/dbconfig/20210126-070443-marostegui.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13954 and previous config saved to /var/cache/conftool/dbconfig/20210126-070152-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13953 and previous config saved to /var/cache/conftool/dbconfig/20210126-070037-marostegui.json
* 07:00 marostegui: Starting s4 eqiad failover from db1081 to db1138 - [[phab:T271427|T271427]]
* 06:55 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1005` - its blazegraph was deadlocked (based on the presence of null values for the blazegraph metrics for that host)
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Set candidate master to weight 0 before the failover [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13952 and previous config saved to /var/cache/conftool/dbconfig/20210126-054337-marostegui.json
* 00:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
* 00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2318.codfw.wmnet
* 00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2319.codfw.wmnet
* 00:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2320.codfw.wmnet
* 00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
* 00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2318.codfw.wmnet
* 00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2319.codfw.wmnet
* 00:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2320.codfw.wmnet
* 00:34 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Invalidate configuration cache when logos.php is touched too (duration: 00m 56s)
* 00:32 legoktm@deploy1001: Synchronized wmf-config/logos.php: Add script to mostly automate logo management (duration: 00m 55s)
* 00:16 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split $wmgSiteLogo<nowiki>{</nowiki>1,1_5,2<nowiki>}</nowiki>x to a separate logos.php (1/2) (duration: 01m 00s)
* 00:14 legoktm@deploy1001: Synchronized wmf-config/logos.php: Split $wmgSiteLogo<nowiki>{</nowiki>1,1_5,2<nowiki>}</nowiki>x to a separate logos.php (1/2) (duration: 00m 56s)
* 00:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T272920|T272920]]: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (2/2) (duration: 00m 58s)
* 00:07 legoktm@deploy1001: Synchronized static/favicon/arbcom_enwiki.ico: [[phab:T272920|T272920]]: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (1/2) (duration: 01m 00s)


== 2021-01-25 ==
== 2021-08-26 ==
* 23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 22:06 legoktm: restarted mailman3-web on lists1001 ([[phab:T289798|T289798]])
* 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
* 23:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66717bc039f40336144dcc0dfd97ff5331b418e9}}: Install Extension Quiz on ja.wikibooks ([[phab:T289383|T289383]]) (duration: 01m 05s)
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2322.codfw.wmnet
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet
* 18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 22:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2322.codfw.wmnet
* 18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2323.codfw.wmnet
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cde88918b73628f2eaaff919ddb869b4dc2c93c6}}: Install Extension Quiz on fa.wikibooks ([[phab:T289381|T289381]]) (duration: 01m 07s)
* 21:45 cstone: civicrm revision changed from {{Gerrit|3afb54f6f9}} to {{Gerrit|dfb2ea2148}}
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4340e9c18468d14885c8ced87f1e014a3481f2a}}: Finalize Event Platform migration of EchoEmail and EchoInteraction ([[phab:T287210|T287210]]) (duration: 01m 07s)
* 21:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2326.codfw.wmnet
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
* 17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
* 20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet
* 17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: [[gerrit:714864{{!}}PageStore: Pass query flags to getPageById() too (T289717 T195069)]] (duration: 01m 05s)
* 20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
* 16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
* 15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
* 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
* 15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 ([[phab:T289249|T289249]])
* 20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 20:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2324.codfw.wmnet
* 12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 19:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet
* 12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - [[phab:T289536|T289536]]
* 19:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
* 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2324.codfw.wmnet
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
* 11:40 Lucas_WMDE: EU backport+config window done
* 19:48 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw14124.eqiad.wmnet
* 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714853{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 04s)
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714854{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 05s)
* 19:44 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
* 19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
* 19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:714770{{!}}Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages]] (duration: 01m 05s)
* 19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 10:09 vgutierrez: rolling restart of varnishkafka-statsv - [[phab:T289618|T289618]]
* 19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:07 vgutierrez: disable puppet on cp-text to merge {{Gerrit|I52cf2a573980e33487d1f05f19b192ae7d13d717}} - [[phab:T286038|T286038]]
* 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 19:37 tgr_: Morning deploys done
* 10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
* 19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - [[phab:T286038|T286038]]
* 19:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658356{{!}}Enables MediaWiki client error instrument on English Wikipedia (T255585)]] (duration: 01m 01s)
* 09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
* 19:20 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
* 19:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:657292{{!}}[beta] GrowthExperiments: set link recommendation feature flags ()]] (duration: 01m 06s)
* 09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
* 19:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2324.codfw.wmnet with reason: REIMAGE
* 09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
* 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
* 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 08:52 vgutierrez: restart varnishkafka-statsv on cp4032
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 16:40 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:48 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 16:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
* 16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 06:43 marostegui: Reimage s4 eqiad master (db1138),  expect lag on eqiad [[phab:T288803|T288803]]
* 15:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 15:42 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:23 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: revert: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 01m 05s)
* 15:20 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 00m 58s)
* 15:16 dcausse: re-opening EU Backport window to ship pending patches
* 15:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 14:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove 2 Remove migrated EventLoggingSchemas overrides - [[phab:T259163|T259163]], [[phab:T267352|T267352]] (duration: 00m 56s)
* 14:35 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 14:34 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:31 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:25 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 12:47 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|6a4cbe662655edaa4f6c36e69877766a6a48d828}}: Revert "Switch fiwiki to their 500k temporary logo!": delete temporary logo files (duration: 00m 57s)
* 12:41 urbanecm@deploy1001: Synchronized wmf-config/MetaContactPages.php: {{Gerrit|7a6a60fcaa635a8f891a6d09f3611f8620490497}}: Create Contact page for Ombuds commission at Meta-Wiki ([[phab:T271828|T271828]]) (duration: 01m 00s)
* 12:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T272292|T272292]]
* 12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|833833385f1cf02a4578edb9b5108d173bdf30bd}}: Adding namespace aliases on arbcom-ruwiki ([[phab:T272292|T272292]]) (duration: 00m 57s)
* 12:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateCollation.php --wiki=trwikivoyage --previous-collation=uppercase # [[phab:T272783|T272783]]
* 12:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bcc7ad7acf721a5e0521bbecfe6df8671ac1822c}}: Set $wgCategoryCollation = uca-tr on trwikivoyage ([[phab:T272783|T272783]]) (duration: 00m 57s)
* 12:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|d34cb3205a58d5ac50800f2f218af6213f74f5e7}}: Resize the logo of Turkish Wikivoyage ([[phab:T272784|T272784]]) (duration: 00m 54s)
* 12:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|177339d96616b5941dbeb2c90ca6aa0be90e3b5a}}: Defining wgSitename for trwikivoyage ([[phab:T272779|T272779]]) (duration: 01m 00s)
* 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|89d072378e16b0410d963deca2fd766c1406b5b6}}: Enable SandboxLink on Turkish Wikivoyage ([[phab:T272780|T272780]]) (duration: 01m 05s)
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|75aa32fd5aee1feebe8a97360068da55cbcf06d8}}: frwiki: Change back to normal logo ([[phab:T272700|T272700]]) (duration: 01m 07s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|693eaec20a24620c2a709c8bac707c0d7af3436b}}: Add bidgee.id.au to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T272202|T272202]]) (duration: 01m 01s)
* 11:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:658242{{!}} Bumping portals to master (T128546)]] (duration: 00m 55s)
* 11:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:658242{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 11:33 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 11:11 godog: thanos delete old orphaned blocks with replica=unset label
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 10:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 10:44 godog: swift decrease weight for ms-be20[16,18,20,22] - [[phab:T272837|T272837]]
* 10:00 moritzm: installing imagemagick security updates on stretch
* 09:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
* 09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
* 09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
* 09:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
* 09:40 godog: bounce apache2 on logstash1024, stuck on high cpu
* 09:21 marostegui@deploy1001: Synchronized wmf-config/etcd.php: Add x2 to the mapping array [[phab:T269324|T269324]] (duration: 00m 58s)
* 09:17 moritzm: installing samba security updates on stretch
* 09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add x2 to the mapping array [[phab:T269324|T269324]] (duration: 01m 01s)
* 09:06 ema: cp3054: install varnish 6.0.1-1wm2 -- 6.0.1 without https://github.com/varnishcache/varnish-cache/pull/2705 [[phab:T264398|T264398]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13944 and previous config saved to /var/cache/conftool/dbconfig/20210125-084715-root.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13943 and previous config saved to /var/cache/conftool/dbconfig/20210125-083211-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13942 and previous config saved to /var/cache/conftool/dbconfig/20210125-081708-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13941 and previous config saved to /var/cache/conftool/dbconfig/20210125-080204-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13940 and previous config saved to /var/cache/conftool/dbconfig/20210125-073322-marostegui.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Add x2 eqiad to dbctl [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13939 and previous config saved to /var/cache/conftool/dbconfig/20210125-064419-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Populate x2 eqiad hosts into dbctl [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13938 and previous config saved to /var/cache/conftool/dbconfig/20210125-064305-marostegui.json


== 2021-01-23 ==
== 2021-08-25 ==
* 22:21 volker-e@deploy1001: Finished deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427) (duration: 00m 06s)
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:21 volker-e@deploy1001: Started deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427)
* 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:05 ryankemper: Depooled `wdqs1013` (it has ~50 mins of lag to catch up on, and also the bad gateway above)
* 23:20 urbanecm: Evening B&C window completed
* 04:03 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1013`: `sudo systemctl restart wdqs-blazegraph`
* 23:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GlobalWatchlist/modules/EntryLog.js: {{Gerrit|230aec3fe7f3d0e325882a5fc926e9f3e4e86717}}: GlobalWatchlistEntryLog: fix storing log id ([[phab:T288385|T288385]]) (duration: 01m 07s)
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2332.codfw.wmnet
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2328.codfw.wmnet
* 22:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2332.codfw.wmnet
* 22:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2328.codfw.wmnet
* 22:10 legoktm@deploy1002: Synchronized debug.json: List primary DC servers first ([[phab:T289246|T289246]]) (duration: 01m 04s)
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2330.codfw.wmnet
* 22:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2334.codfw.wmnet
* 22:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/includes/Content/BoardContent.php: {{Gerrit|694b94657d251df64145e8153b269094bba75be9}}: BoardContent: Fix deprecation warning ([[phab:T289625|T289625]]) (duration: 01m 04s)
* 01:48 foks: reset user email for Davey2010
* 22:04 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditor.php: {{Gerrit|73478bc9c72286123cef69e57e0aef9e745dcff9}}: Make sure params is an array ([[phab:T289730|T289730]]) (duration: 01m 04s)
* 01:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 22:00 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 01:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 21:59 brennen: 1.37.0-wmf.20 train status ([[phab:T281161|T281161]]) blockers should be patched shortly; as we've reached the 15:00 Pacific deploy cutoff for the day, train will resume first thing in US morning
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2330.codfw.wmnet
* 21:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2334.codfw.wmnet
* 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
* 21:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|cc04b33dec6b9aed1d7621957c4de527266600d1}}: EventDispatcher: Try really, really hard to read from master ([[phab:T289717|T289717]]) (duration: 01m 04s)
* 00:46 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant ([[phab:T272526|T272526]]) (duration: 00m 57s)
* 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: {{Gerrit|34fb2b99104d0a2bda8aa202f4cdeb07cb983531}}: PageStore: Pass query flags to getPageByName() ([[phab:T289717|T289717]]; [[phab:T195069|T195069]]) (duration: 01m 06s)
* 00:36 legoktm@deploy1001: Synchronized static/images/project-logos/: Add enwiki20 "Option A" fixed logos ([[phab:T272526|T272526]]) (duration: 00m 59s)
* 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:14 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: {{Gerrit|190d8b7579af981cf2f5e4a6d9457ee0a7edca3f}}: Use Parser::getUserIdentity() instead of ::getUser() in SimpleCaptcha ([[phab:T289731|T289731]]) (duration: 01m 05s)
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:03 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/ProofreadPage/: {{Gerrit|913043a5ca7982e07ab0c01f88076af866a43cc3}}: Fixes exception thrown by FilePagination::getPageNumber ([[phab:T289728|T289728]]) (duration: 01m 06s)
* 20:02 brennen: 1.37.0-wmf.20 ([[phab:T281161|T281161]]) status: blocked at group0; 2/3 blockers have probable patches, all seem to be getting attention, so holding off on blocker mail for now.
* 19:54 urbanecm: enwikisource: Start server-side upload for one video file ([[phab:T289698|T289698]])
* 19:45 urbanecm: Start server-side upload for ~2 GB tiff file ([[phab:T289711|T289711]])
* 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:28 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
* 19:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:14 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 04s)
* 19:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 19:10 eileen: tools revision changed from {{Gerrit|15bfaa7117}} to {{Gerrit|14e4125f73}}
* 18:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:42 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Flow/modules/editor/editors/visualeditor/ui/inspectors/mw.flow.ve.ui.MentionInspector.js: {{Gerrit|dd464b4522effbfabea371f8b95b0b25d53da43e}}: Fix reference to renamed abortAllApiRequests method ([[phab:T289648|T289648]]) (duration: 01m 04s)
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/skins/WikimediaApiPortal/src/Component/NotificationAlertComponent.php: {{Gerrit|a5bfcc8def96ad1b44fff31c4c1965311be2982a}}: Remove call to text() on string ([[phab:T289692|T289692]]) (duration: 01m 04s)
* 18:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e7c8c041faa974585128c48631522a401fb3d41d}}: Add Wikimedia ES to $wgCopyUploadsDomains whitelist ([[phab:T289446|T289446]]) (duration: 01m 04s)
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6df0803e4eaca91bd725bcd376b260b97917de3}}: Disable legacy media dom on a few more wikis ([[phab:T51097|T51097]]) (duration: 01m 05s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5182ac88263f23c15a3b10d0f3bc2e492fe425d5}}: Disable upcoming DiscussionTools automatic topic subscriptions for now (duration: 01m 04s)
* 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:06 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b14eb525e99008d5103a93c5bd01f75211dca99}}: Enable topic subscriptions as a beta feature on Wikipedias except enwiki ([[phab:T287801|T287801]]) (duration: 01m 06s)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:46 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: [[gerrit:714674{{!}}Set EntityHandler::generateHTMLOnEdit to false (T285987)]] (duration: 01m 06s)
* 17:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase: Backport: [[gerrit:714677{{!}}Return normalized snaks from SetClaim, SetReference (T289501)]] (duration: 01m 11s)
* 17:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 ryankemper: [[phab:T289483|T289483]] Depooled `wdqs1013`
* 17:14 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Wikibase/repo/includes/Content/EntityHandler.php: Backport: [[gerrit:714675{{!}}Set EntityHandler::generateHTMLOnEdit to false (T285987)]] (duration: 01m 18s)
* 17:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:22 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in mywiki shell.php session (same issue as [[phab:T289690|T289690]])
* 15:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=zh_yuewiki growthexperiments # [[phab:T289680|T289680]]
* 15:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments/includes/Config/WikiPageConfigWriter.php: {{Gerrit|0b9ca1e11c1f0397847d4cfc7bc86220b6ebe9f6}}: WikiPageConfigWriter: Fix `autopatrol` right name ([[phab:T288886|T288886]]) (duration: 01m 04s)
* 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 3/3) (duration: 01m 06s)
* 14:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 14:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 14:56 urbanecm@deploy1002: Synchronized wmf-config/config/: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 2/3) (duration: 01m 05s)
* 14:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]; 1/3) (duration: 01m 06s)
* 14:54 urbanecm@deploy1002: sync-file aborted: {{Gerrit|0ccac4b2816f01c4b035aa51cbe4651c715632e0}}: Deploy Growth features to 44 new Wikipedias in dark mode ([[phab:T289680|T289680]]) (duration: 00m 01s)
* 14:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 14:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:46 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:42 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=brwiki # [[phab:T289690|T289690]], [[phab:T289680|T289680]]
* 14:40 urbanecm: Run `User::newSystemUser( 'MediaWiki default', ['steal' => true] )` in brwiki shell.php session ([[phab:T289690|T289690]])
* 14:35 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:32 urbanecm: mwmaint2002: scap pull # clearing temporary config changes
* 14:30 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
* 14:26 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:25 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:23 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/initWikiConfig.php # [[phab:T289680|T289680]] # r714765 applied at mwmaint2002
* 14:22 urbanecm: Apply https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/714765/ at mwmaint2002 temporarily ([[phab:T289680|T289680]])
* 14:21 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:20 urbanecm: Create GrowthExperiments DB tables for wikis listed in P17081 ([[phab:T289680|T289680]])
* 14:20 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
* 14:18 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
* 14:17 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
* 14:15 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
* 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
* 14:10 ejegg: updated fundraising CiviCRM from {{Gerrit|d60442e119}} to {{Gerrit|13bf3a02df}}
* 14:08 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
* 13:59 volans@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
* 13:59 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2001.codfw.wmnet with reason: apostrophe's test failure
* 13:57 ejegg: updated fundraising CiviCRM from {{Gerrit|42bb64c608}} to {{Gerrit|d60442e119}}
* 13:53 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
* 13:53 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: apostrophe's test
* 13:51 volans: upgraded spicerack to 0.0.58 on cumin2002
* 13:37 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213] (duration: 05m 55s)
* 13:32 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7bed213]
* 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213] (duration: 00m 07s)
* 13:31 joal@deploy1002: Started deploy [analytics/refinery@7bed213] (thin): Regular analytics weekly train THIN [analytics/refinery@7bed213]
* 13:31 joal@deploy1002: Finished deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213] (duration: 20m 25s)
* 13:10 joal@deploy1002: Started deploy [analytics/refinery@7bed213]: Regular analytics weekly train [analytics/refinery@7bed213]
* 13:03 jayme: restarted all pods in kube-system namespace in codfw k8s cluster - [[phab:T289131|T289131]]
* 12:25 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:21 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 11:39 jayme: slowly restarting all pods in kube-system namespace in eqiad k8s cluster - [[phab:T289131|T289131]]
* 11:38 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-coord1002.eqiad.wmnet
* 11:32 kharlan@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: Backport: [[gerrit:714670{{!}}ApiVisualEditorEdit: data-<nowiki>{</nowiki>plugin<nowiki>}</nowiki> is not multi (T289652)]] (duration: 01m 06s)
* 11:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 volans: uploaded spicerack_0.0.58 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 11:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 10:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 10:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 10:49 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/Storage/DerivedPageDataUpdater.php: Backport: [[gerrit:714672{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]], Part II (duration: 01m 04s)
* 10:47 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/content/ContentHandler.php: Backport: [[gerrit:714672{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]], Part I (duration: 01m 08s)
* 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 10:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:21 jbond: rolling out openssl updates
* 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.20/includes: Backport: [[gerrit:714671{{!}}Introduce concept of generateHTMLOnEdit() for ContentHandler (T285987)]] (duration: 02m 17s)
* 10:01 mutante: - removed jmads from wmf group
* 09:59 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-coord1002.eqiad.wmnet
* 09:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 09:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 08:59 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
* 08:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2033.codfw.wmnet with reason: REIMAGE
* 08:17 godog: swift codfw add ms-be20[62-65] with initial weight - [[phab:T288458|T288458]]
* 07:01 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reimage [[phab:T288803|T288803]]', diff saved to https://phabricator.wikimedia.org/P17078 and previous config saved to /var/cache/conftool/dbconfig/20210825-064319-marostegui.json
* 06:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging [[phab:T288244|T288244]]
* 06:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2118.codfw.wmnet with reason: Reimaging [[phab:T288244|T288244]]
* 06:07 kormat@cumin1001: dbctl commit (dc=all): 'Depool db2118 until it's reimaged to buster [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17077 and previous config saved to /var/cache/conftool/dbconfig/20210825-060742-kormat.json
* 06:02 kormat@cumin1001: dbctl commit (dc=all): 'Promote db2121 to s7 primary and set section read-write [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17076 and previous config saved to /var/cache/conftool/dbconfig/20210825-060222-kormat.json
* 06:01 kormat@cumin1001: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - [[phab:T289129|T289129]]', diff saved to https://phabricator.wikimedia.org/P17075 and previous config saved to /var/cache/conftool/dbconfig/20210825-060112-kormat.json
* 06:00 kormat: Starting s7 codfw failover from db2118 to db2121 - [[phab:T289129|T289129]]
* 05:33 eileen: civicrm revision changed from {{Gerrit|a4ce949828}} to {{Gerrit|42bb64c608}}, config revision is {{Gerrit|1afcea7f5b}}
* 05:28 kormat: Moving s7 codfw replicas under db2121 - [[phab:T289129|T289129]]
* 05:27 kormat@cumin1001: dbctl commit (dc=all): 'Set db2121 with weight 0 [[phab:T289129|T289129]]', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20210825-052741-kormat.json
* 05:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:04:00 on 27 hosts with reason: Primary switchover s7 [[phab:T289129|T289129]]
* 05:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:04:00 on 27 hosts with reason: Primary switchover s7 [[phab:T289129|T289129]]
* 02:06 eileen: civicrm revision changed from {{Gerrit|8ed303f2d1}} to {{Gerrit|a4ce949828}}, config revision is {{Gerrit|ac2d75d4a8}}
* 00:53 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 00:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 00:47 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .


== 2021-01-22 ==
== 2021-08-24 ==
* 22:41 reedy@deploy1001: Synchronized invalid.json: (no justification provided) (duration: 00m 58s)
* 22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per [[phab:T282873|T282873]]#7303828
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6fd96b15e6e3c068c2faac60208b9722d32af0f}}: Growth features: Promote 9 wikis out of dark mode ([[phab:T287871|T287871]]; [[phab:T287874|T287874]]; [[phab:T287872|T287872]]; [[phab:T287880|T287880]]; [[phab:T287868|T287868]]; [[phab:T287873|T287873]]; [[phab:T287879|T287879]]; [[phab:T287875|T287875]]; [[phab:T287876|T287876]]) (duration: 01m 25s)
* 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
* 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet
* 19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet
* 17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet
* 17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet
* 17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet
* 15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet
* 15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 19:09 mutante: releases1002 systemctl reset-failed
* 14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 13:12 XioNoX: push pfw policies - [[phab:T289353|T289353]]
* 19:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet
* 12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet
* 12:33 godog: test patched python3-eventlet on thanos-fe1003 - [[phab:T283714|T283714]]
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet
* 12:30 marostegui: Install 10.4.21 on clouddb1015
* 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet
* 11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet
* 11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet
* 09:08 jbond: upload new statograph version
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:17 mutante: releases2002 - rebooting to confirm works now and also new disk gets auto-mounted
* 08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 18:03 mutante: releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again
* 08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - [[phab:T283714|T283714]]
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 07:51 dcausse: repool wdqs1012 [[phab:T289551|T289551]]
* 17:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 07:29 dcausse: restarting blazegraph on wdqs1012
* 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
* 17:57 mutante: releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into [[phab:T272555|T272555]] but if it does now it's known how to fix
* 07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
* 17:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 17:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 03:51 rzl: rzl@wdqs1012:~$ sudo depool
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
* 17:52 mutante: releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 17:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]] (duration: 65m 37s)
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
* 17:29 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s)
* 00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
* 17:29 mforns@deploy1001: Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
* 17:23 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s)
* 00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`
* 17:13 mforns@deploy1001: Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]]
* 16:40 cmjohnson1: replacing optics/fiber  pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 [[phab:T271295|T271295]]
* 16:19 jynus: restart of backup source hosts on codfw [[phab:T271913|T271913]]
* 15:54 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 15:40 moritzm: installing puppetboard1002
* 15:24 moritzm: installing puppetboard2002
* 13:44 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13932 and previous config saved to /var/cache/conftool/dbconfig/20210122-134444-kormat.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13931 and previous config saved to /var/cache/conftool/dbconfig/20210122-133341-marostegui.json
* 13:31 marostegui: Stop replication on db1121
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13930 and previous config saved to /var/cache/conftool/dbconfig/20210122-133044-marostegui.json
* 13:29 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13929 and previous config saved to /var/cache/conftool/dbconfig/20210122-132939-kormat.json
* 13:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2002.codfw.wmnet
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13927 and previous config saved to /var/cache/conftool/dbconfig/20210122-132028-kormat.json
* 13:14 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13926 and previous config saved to /var/cache/conftool/dbconfig/20210122-131436-kormat.json
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13925 and previous config saved to /var/cache/conftool/dbconfig/20210122-130525-kormat.json
* 12:59 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13924 and previous config saved to /var/cache/conftool/dbconfig/20210122-125932-kormat.json
* 12:54 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard2002.codfw.wmnet
* 12:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1002.eqiad.wmnet
* 12:50 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13923 and previous config saved to /var/cache/conftool/dbconfig/20210122-125021-kormat.json
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'db1149 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13922 and previous config saved to /var/cache/conftool/dbconfig/20210122-124748-kormat.json
* 12:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 12:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 12:43 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1110 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13921 and previous config saved to /var/cache/conftool/dbconfig/20210122-124310-kormat.json
* 12:38 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard1002.eqiad.wmnet
* 12:38 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1127 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13920 and previous config saved to /var/cache/conftool/dbconfig/20210122-123832-kormat.json
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13919 and previous config saved to /var/cache/conftool/dbconfig/20210122-123518-kormat.json
* 12:33 volker-e@deploy1001: Finished deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424) (duration: 00m 07s)
* 12:33 volker-e@deploy1001: Started deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424)
* 12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1135,1137].eqiad.wmnet
* 12:08 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1135,1137].eqiad.wmnet
* 12:00 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13918 and previous config saved to /var/cache/conftool/dbconfig/20210122-120011-kormat.json
* 11:54 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:51 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13917 and previous config saved to /var/cache/conftool/dbconfig/20210122-115113-kormat.json
* 11:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 11:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 11:46 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13916 and previous config saved to /var/cache/conftool/dbconfig/20210122-114642-kormat.json
* 11:45 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13915 and previous config saved to /var/cache/conftool/dbconfig/20210122-114507-kormat.json
* 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 11:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 11:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 11:36 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13914 and previous config saved to /var/cache/conftool/dbconfig/20210122-113610-kormat.json
* 11:31 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13913 and previous config saved to /var/cache/conftool/dbconfig/20210122-113139-kormat.json
* 11:30 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13912 and previous config saved to /var/cache/conftool/dbconfig/20210122-113004-kormat.json
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 11:24 kormat@cumin1001: dbctl commit (dc=all): 'es1023 depooling: enable report_host [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13911 and previous config saved to /var/cache/conftool/dbconfig/20210122-112424-kormat.json
* 11:24 hnowlan: joining restbase2009-a to cluster
* 11:21 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13910 and previous config saved to /var/cache/conftool/dbconfig/20210122-112106-kormat.json
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
* 11:16 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13909 and previous config saved to /var/cache/conftool/dbconfig/20210122-111635-kormat.json
* 11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
* 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13908 and previous config saved to /var/cache/conftool/dbconfig/20210122-111500-kormat.json
* 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
* 11:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
* 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13906 and previous config saved to /var/cache/conftool/dbconfig/20210122-110603-kormat.json
* 11:05 jbond42: deploy cairo updates to jessie
* 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13905 and previous config saved to /var/cache/conftool/dbconfig/20210122-110229-kormat.json
* 11:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 11:01 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13904 and previous config saved to /var/cache/conftool/dbconfig/20210122-110132-kormat.json
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'db1136 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13903 and previous config saved to /var/cache/conftool/dbconfig/20210122-105952-kormat.json
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1127 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13902 and previous config saved to /var/cache/conftool/dbconfig/20210122-105921-kormat.json
* 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db1134 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13901 and previous config saved to /var/cache/conftool/dbconfig/20210122-105636-kormat.json
* 10:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1088 from api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13900 and previous config saved to /var/cache/conftool/dbconfig/20210122-105345-kormat.json
* 10:52 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13899 and previous config saved to /var/cache/conftool/dbconfig/20210122-105244-kormat.json
* 10:37 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13898 and previous config saved to /var/cache/conftool/dbconfig/20210122-103741-kormat.json
* 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13897 and previous config saved to /var/cache/conftool/dbconfig/20210122-103609-kormat.json
* 10:22 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13895 and previous config saved to /var/cache/conftool/dbconfig/20210122-102237-kormat.json
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13894 and previous config saved to /var/cache/conftool/dbconfig/20210122-102105-kormat.json
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
* 10:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
* 10:07 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13893 and previous config saved to /var/cache/conftool/dbconfig/20210122-100734-kormat.json
* 10:06 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13892 and previous config saved to /var/cache/conftool/dbconfig/20210122-100602-kormat.json
* 10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1130 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13891 and previous config saved to /var/cache/conftool/dbconfig/20210122-100307-kormat.json
* 10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:02 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1110 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13890 and previous config saved to /var/cache/conftool/dbconfig/20210122-100233-kormat.json
* 09:52 moritzm: uploaded cairo 1.14.0-2.1+deb8u2+wmf1 to apt.wikimedia.org
* 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13889 and previous config saved to /var/cache/conftool/dbconfig/20210122-095058-kormat.json
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db1093 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13888 and previous config saved to /var/cache/conftool/dbconfig/20210122-094453-kormat.json
* 09:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:43 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1088 to api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13887 and previous config saved to /var/cache/conftool/dbconfig/20210122-094337-kormat.json
* 08:49 moritzm: installing PIP security updates for stretch
* 08:44 moritzm: installing mutt updates for stretch
* 08:35 XioNoX: Remove BGP for Zayo transit in ulsfo, eqiad, eqord
* 08:33 elukey: update puppet compiler's facts
* 07:26 ryankemper: [WDQS Deploy] WDQS deploy complete; service is healthy
* 06:59 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:58 ryankemper: [WDQS Deploy] Initial deploy complete, `query.wikidata.org` handles queries fine, proceeding to post-deploy steps
* 06:57 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 10m 43s)
* 06:50 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` following canary WDQS deploy, proceeding to rest of fleet
* 06:46 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 06:46 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` before WDQS deploy, beginning deploy
* 06:45 ryankemper: [wdqs] re-pooled `wdqs1013` (all caught up on lag)
* 06:16 marostegui: Stop MySQL on db1117 db2133 db2078 [[phab:T272614|T272614]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2143 and db2144 as x2 codfw slaves [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13885 and previous config saved to /var/cache/conftool/dbconfig/20210122-060147-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2142 into x2 as codfw master [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13884 and previous config saved to /var/cache/conftool/dbconfig/20210122-060007-marostegui.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight', diff saved to https://phabricator.wikimedia.org/P13883 and previous config saved to /var/cache/conftool/dbconfig/20210122-054330-marostegui.json
* 01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2368.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2366.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2368.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2366.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
* 01:19 Urbanecm: Evening B&C window finished
* 01:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/: {{Gerrit|7d8ab70d5b00142e8344e242dd085eb7bfa81145}}: Dont return the status of doBlockInternal when processing block actions (duration: 00m 59s)
* 01:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|376cba1b33dd68d40490a1498c59a4d430318ab1}}: Enroll idwiki in the DiscussionTools a/b test ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/DiscussionTools/: {{Gerrit|513a7861bbcf06a8ac5c29e1b9838640cbd7c628}}: A/B test output when a specific feature is being tested ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/WikibaseMediaInfo/: {{Gerrit|4b0259b761681ca90b3f3039019553ddca40a5fe}}: Distinguish between null continue value and unknown one ([[phab:T272548|T272548]]) (duration: 00m 59s)
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2376.codfw.wmnet
* 01:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 01:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
* 01:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 01:00 Urbanecm: Evening B&C still in process, waiting on Zuul
* 00:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 00:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2372.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2370.codfw.wmnet
* 00:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 00:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4f5d6f09977962be1c49471432125a92357ede6}}: Temporarily amend ukwiki AF configuration ([[phab:T272330|T272330]]) (duration: 01m 03s)
* 00:20 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/MobileFrontend: Backport: [[gerrit:657702{{!}}Fix toggling storage cleanup (T272638)]] (duration: 01m 07s)
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2372.codfw.wmnet
* 00:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2370.codfw.wmnet
* 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster
* 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster


== 2021-01-21 ==
== 2021-08-23 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 23:41 ryankemper: [[phab:T285355|T285355]] `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
* 23:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 23:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 18:56 tgr: morning deploys done
* 23:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: [[gerrit:714158{{!}}Add Link: store when tasks were generated (T284551)]] (duration: 00m 57s)
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:10 brennen: 1.36.0-wmf.27 train status: for avoidance of doubt, no deploys until further notice - sorting out [[phab:T272638|T272638]]
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.26
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713907{{!}}wmfSetupEtcd only supports array input]] (duration: 00m 57s)
* 20:04 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac99da75f9507e19472ab3020be638262857ec07}}: Migrate WebUIActionsTracking schemas to Event Platform on testwiki ([[phab:T267347|T267347]]; [[phab:T271164|T271164]]) (duration: 01m 03s)
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bb9e5d13be702516368774732a9e1711bec42e5}}: Enables the Wikisource extension on oldwikisource ([[phab:T272163|T272163]]) (duration: 01m 04s)
* 18:23 dancy@deploy1002: Synchronized wmf-config: Config: [[gerrit:713906{{!}}Use array format to specify etcd server]] (duration: 00m 57s)
* 19:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/EventLogging/: {{Gerrit|ee830a5ec2051fa970084e89b477a44c384e309c}}: {{Gerrit|f7152a74e00404fc561c44d1c2e37d7f882e2f52}}: EventLogging backport, see commits for details ([[phab:T253121|T253121]]) (duration: 01m 05s)
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2226.codfw.wmnet
* 18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713704{{!}}Allow protocol for etcd server to be specified]] (duration: 00m 57s)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2375.codfw.wmnet
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2373.codfw.wmnet
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2371.codfw.wmnet
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
* 17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
* 19:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62c9c35a76e2d065922f8c9f5a58672240dea7de}}: Migrate SuggestedTagsAction to Event Platform on all wikis ([[phab:T267351|T267351]]) (duration: 01m 03s)
* 17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0b46c9f1f75fc773f57bfa70521c9eaf20410b9e}}: [no-op] Add notes about load order of Wikisource and Collection extensions ([[phab:T255790|T255790]]) (duration: 01m 11s)
* 16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2375.codfw.wmnet
* 16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2373.codfw.wmnet
* 16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2371.codfw.wmnet
* 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 19:02 cstone: civicrm revision changed from {{Gerrit|a4caad22b1}} to {{Gerrit|3afb54f6f9}}
* 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:53 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 3/3) (duration: 00m 56s)
* 18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 2/3) (duration: 00m 57s)
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 1/3) (duration: 00m 57s)
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=[[phab:T287867|T287867]] # [[phab:T287867|T287867]]
* 18:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # [[phab:T287867|T287867]]
* 18:14 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 18:08 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 18:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 17:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 17:35 ryankemper: [wdqs] Depooled `wdqs1013` to allow it to catch up on lag
* 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 00m 57s)
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 11:35 Lucas_WMDE: EU backport+config window done
* 16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
* 11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (2/2) (duration: 00m 57s)
* 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (1/2) (duration: 00m 58s)
* 15:13 moritzm: installing cairo security updates on stretch
* 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 godog: roll-restart swift-object in eqiad to apply new concurrency
* 11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:713860{{!}}Revert "Enable NewUserMessage on hiwiktionary" (T287091)]] (duration: 00m 57s)
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
* 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4002.wikimedia.org
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4002.wikimedia.org
* 09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714152{{!}}Add extra sleep option between each batch in pruneRevData.php (T289249)]] (duration: 00m 58s)
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
* 09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master ([[phab:T288400|T288400]], [[phab:T288897|T288897]])
* 13:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
* 09:53 urbanecm: Deploy security patch for [[phab:T289408|T289408]]
* 13:38 XioNoX: put eqiad/esams lumen link back in service
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13872 and previous config saved to /var/cache/conftool/dbconfig/20210121-122043-root.json
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13871 and previous config saved to /var/cache/conftool/dbconfig/20210121-120540-root.json
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13870 and previous config saved to /var/cache/conftool/dbconfig/20210121-115036-root.json
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13868 and previous config saved to /var/cache/conftool/dbconfig/20210121-113533-root.json
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 11:29 marostegui: Stop replication on db1085 to move wiki replicas under the other sanitarium host
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P13867 and previous config saved to /var/cache/conftool/dbconfig/20210121-112849-marostegui.json
* 09:01 godog: pooling swift in eqiad - [[phab:T288458|T288458]]
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:44 hoo: Updated the Wikidata property suggester with data from the 2021-01-11 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:00 marostegui: m1 master restart - [[phab:T271540|T271540]]
* 07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714322{{!}}Set request languages rdf output for wikidata to true (T285795)]] (duration: 00m 57s)
* 08:51 jynus: stopping puppet and bacula for backup1001 [[phab:T271540|T271540]]
* 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:43 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
* 08:37 marostegui: Silence m1 hosts in preparation for the restart [[phab:T271540|T271540]]
* 07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714151{{!}}Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249)]] (duration: 00m 59s)
* 08:34 godog: roll-restart swift-object in codfw to apply new concurrency
* 07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13864 and previous config saved to /var/cache/conftool/dbconfig/20210121-072101-marostegui.json
* 07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13863 and previous config saved to /var/cache/conftool/dbconfig/20210121-070346-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13862 and previous config saved to /var/cache/conftool/dbconfig/20210121-065459-marostegui.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P13861 and previous config saved to /var/cache/conftool/dbconfig/20210121-065408-marostegui.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and pool db1099:3318 into s8 vslow', diff saved to https://phabricator.wikimedia.org/P13860 and previous config saved to /var/cache/conftool/dbconfig/20210121-064903-marostegui.json
* 03:54 milimetric@deploy1001: deploy aborted: Minor typo fix (duration: 01m 39s)
* 03:52 milimetric@deploy1001: Started deploy [analytics/refinery@57589e7]: Minor typo fix
* 01:27 ryankemper: [WDQS Deploy] Rollback complete, service health of `wdqs1003` is restored. Need to investigate source of 404 (possibly related to some recent changes we made in the `gui` repo)
* 01:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 02m 53s)
* 01:26 ryankemper: [WDQS Deploy] Rollback of canary `wdqs1003` initiated
* 01:25 ryankemper: [WDQS Deploy] Automated tests passing on canary`wdqs1003` but manually visiting `http://localhost:9999` (my tunnel to `wdqs1003`) gives `404 Not Found`from nginx; aborting deploy
* 01:23 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 01:22 ryankemper: [WDQS Deploy] Tests on canary `wdqs1003` passing before start of deploy, proceeding with deploy of wdqs `0.3.60` to canary
* 00:44 legoktm: legoktm@mwmaint1002:~$ mwscript initSiteStats.php --wiki=trwikivoyage --update
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2369.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2367.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2365.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2363.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2369.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2365.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2367.codfw.wmnet
* 00:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2363.codfw.wmnet


== 2021-01-20 ==
== 2021-08-21 ==
* 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 23:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:30 mutante: releases2002 - rebooting VM
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2361.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2359.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
* 23:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 23:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 23:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2357.codfw.wmnet
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2361.codfw.wmnet
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2359.codfw.wmnet
* 23:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2355.codfw.wmnet
* 23:03 legoktm: updated docker-registry.discovery.wmnet/wikimedia-buster image
* 23:01 mutante: mw2331, mw2333 - scap pull
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 22:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 22:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 22:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2353.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2351.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 22:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
* 21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244] (duration: 00m 07s)
* 21:35 milimetric@deploy1001: Started deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244]
* 21:34 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244] (duration: 10m 52s)
* 21:24 milimetric@deploy1001: Started deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244]
* 21:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 21:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 21:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 21:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 20:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 20:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
* 20:46 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2333.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2333.codfw.wmnet
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
* 20:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 20:41 effie: restart mc-gp2001, mc-gp2002, mc-gp2003 for [[phab:T269596|T269596]]
* 20:31 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.27 (duration: 03m 05s)
* 20:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.27
* 20:23 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train: proceeding to group1
* 20:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒🍵 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 20:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train status as of deploy window: currently blocked at group0 on [[phab:T272508|T272508]]
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:50 bblack: lvs1015: bringing pybal back online
* 19:47 bblack: lvs1015: stopping pybal to try to fix a lingering ifup service state issue on the host, which may require downing an interface
* 19:33 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|5c941678ec739dd6b5257b4a8f866b7e3a257f45}}: Revert: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 04s)
* 19:24 effie: depool and repool thumbor* to upgrade python-thumbor-wikimedia to v2.9
* 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|13fb338249b3ec73e380c4971ee697f28a2f6d76}}: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 05s)
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/config/kuwiki.yaml: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 2/2) (duration: 01m 05s)
* 19:11 XioNoX: add BGP to Lumen in eqiad
* 19:11 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 1/2) (duration: 01m 04s)
* 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2325.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2327.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2329.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2316.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2329.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2327.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2325.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2316.codfw.wmnet
* 18:42 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/includes/View/AbuseFilterViewDiff.php: Backport: [[gerrit:657366{{!}}Catch ClosestFilterVersionNotFoundException in ViewDiff (T272505)]] (duration: 01m 06s)
* 18:29 bblack: lvs1015: re-enabling puppet + pybal - [[phab:T272258|T272258]]
* 18:25 XioNoX: draining esams-eqiad link
* 18:24 mutante: ganeti - creating 150G virtual hard disk and adding it to releases2002 for [[phab:T272092|T272092]]
* 18:22 mutante: ganeti - creating 105G virtual harddisk and adding to releases1002 for [[phab:T272092|T272092]]
* 18:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:01 bblack: lvs1015 - shutdown for [[phab:T272258|T272258]]
* 17:58 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:54 bblack: lvs1015: stopping pybal with puppet disabled for [[phab:T272258|T272258]]
* 17:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:24 volans@cumin2001: START - Cookbook sre.dns.netbox
* 16:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 16:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 15:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 15:55 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 15:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13858 and previous config saved to /var/cache/conftool/dbconfig/20210120-154726-kormat.json
* 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13857 and previous config saved to /var/cache/conftool/dbconfig/20210120-153223-kormat.json
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
* 15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 15:18 brennen: 1.36.0-wmf.27 train unblocked, proceeding to group0 ([[phab:T271341|T271341]])
* 15:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 15:17 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13856 and previous config saved to /var/cache/conftool/dbconfig/20210120-151719-kormat.json
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 15:15 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13855 and previous config saved to /var/cache/conftool/dbconfig/20210120-151555-kormat.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13854 and previous config saved to /var/cache/conftool/dbconfig/20210120-150216-kormat.json
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 15:00 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 66%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13853 and previous config saved to /var/cache/conftool/dbconfig/20210120-150051-kormat.json
* 14:59 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on all wikis - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 05s)
* 14:56 kormat@cumin1001: dbctl commit (dc=all): 'db1109 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13852 and previous config saved to /var/cache/conftool/dbconfig/20210120-145605-kormat.json
* 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 14:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on testwiki - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 06s)
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 14:45 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 33%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13851 and previous config saved to /var/cache/conftool/dbconfig/20210120-144547-kormat.json
* 14:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 14:26 kormat@cumin1001: dbctl commit (dc=all): 'db1076 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13850 and previous config saved to /var/cache/conftool/dbconfig/20210120-142636-kormat.json
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13849 and previous config saved to /var/cache/conftool/dbconfig/20210120-142139-kormat.json
* 14:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 14:12 kormat@cumin1001: dbctl commit (dc=all): 'db1075 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13848 and previous config saved to /var/cache/conftool/dbconfig/20210120-141230-kormat.json
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 14:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 13:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/Translate/: {{Gerrit|20decbd5cc3de0af655b9419cf69fc442ab056a4}}: Add flag to toggle the usage of the group synchronization cache ([[phab:T272428|T272428]]; [[phab:T182433|T182433]]) (duration: 01m 10s)
* 13:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
* 12:31 godog: bounce icinga on alert1001
* 12:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
* 12:10 matthiasmullie: EU config window done
* 12:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
* 12:08 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2fc57b259}}: Remove MediaSearch survey (duration: 01m 10s)
* 12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
* 12:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13847 and previous config saved to /var/cache/conftool/dbconfig/20210120-112808-root.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13846 and previous config saved to /var/cache/conftool/dbconfig/20210120-111305-root.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13845 and previous config saved to /var/cache/conftool/dbconfig/20210120-105801-root.json
* 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
* 10:51 XioNoX: Discard the non-whitelisted 172.16.0.0/12 traffic - [[phab:T209082|T209082]]
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
* 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13844 and previous config saved to /var/cache/conftool/dbconfig/20210120-104257-root.json
* 10:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13842 and previous config saved to /var/cache/conftool/dbconfig/20210120-103449-marostegui.json
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
* 10:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2027.codfw.wmnet
* 10:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2027.codfw.wmnet
* 10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2026.codfw.wmnet
* 10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2026.codfw.wmnet
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2025.codfw.wmnet
* 09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2025.codfw.wmnet
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2024.codfw.wmnet
* 09:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2024.codfw.wmnet
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2021.codfw.wmnet
* 09:32 moritzm: installing cuminunpriv1001
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2021.codfw.wmnet
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2020.codfw.wmnet
* 09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2020.codfw.wmnet