You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for gerrit:639505. Will resume moving train to group1 on Monday morning (US) (T263182) (duration: 69m 02s))
imported>Stashbot
(ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0))
 
(301 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-11-06 ==
== 2021-10-16 ==
* 00:53 brennen@deploy1001: Finished scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]]) (duration: 69m 02s)
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)


== 2020-11-05 ==
== 2021-10-15 ==
* 23:44 brennen@deploy1001: Started scap: Synchronizing to pick up i18n for [[gerrit:639505]]. Will resume moving train to group1 on Monday morning (US) ([[phab:T263182|T263182]])
* 23:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:38 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/includes/media/FormatMetadata.php: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - FormatMetData.php (T267370)]] (duration: 07m 22s)
* 23:27 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 23:29 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages/i18n/exif: Backport: [[gerrit:639505{{!}}media: Support GPSAltitudeRef exif tag - i18n/exif files (T267370)]] (duration: 01m 08s)
* 23:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:09 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/vendor: Backport: [[gerrit:639504{{!}}Bump wikimedia/parsoid to 0.13.0-a16 (T267146)]] (duration: 01m 14s)
* 22:38 mutante: apt1001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 20:54 hnowlan: reenabled tilerator in eqiad
* 22:36 mutante: apt2001 - removing nginx package, accidentally installed, should just be nginx-light of course, running puppet
* 20:47 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.14
* 22:34 mutante: apt2001 - upgraded nginx
* 20:44 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.16 (duration: 01m 39s)
* 22:18 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:42 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.16
* 22:14 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 20:39 hnowlan: finished removenode of maps2002 cassandra
* 22:05 dpifke@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes (duration: 00m 05s)
* 20:22 brennen: train: waiting ~15 minutes before rolling forward to group1.
* 22:05 dpifke@deploy1002: Started deploy [performance/arc-lamp@40cb764]: Revert problematic arclamp patch to fix daemon crashes
* 20:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 21:51 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:15 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/CentralAuth/includes/specials/SpecialCentralAuth.php: Backport: [[gerrit:639500{{!}}Dont double-format numeric edit count (T267362)]] (duration: 01m 06s)
* 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:44 Urbanecm: Morning B&C window done
* 21:44 dzahn@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.16/extensions/GrowthExperiments/modules/homepage/: {{Gerrit|81cb1c7b141d49d7fc931fdc13ffd1b48b3a25ab}}: Suggested edits: Export task count from start editing dialog ([[phab:T266868|T266868]]; [[phab:T263040|T263040]]) (duration: 01m 07s)
* 21:36 dzahn@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|453b9c64c44a256eafdfafe7a0023484377bbbd2}}: Fix DiscussionTools wikis config for thwiki/tgwiki ([[phab:T266303|T266303]]) (duration: 01m 08s)
* 20:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:32 razzi: shutting down kafka-jumbo1005 to allow dcops to upgrade NIC
* 18:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:52 akosiaris: restart uwsgi-ores in all ores1* nodes per complaint on IRC that max redis clients have been reached [[phab:T263910|T263910]]
* 17:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.14
* 17:17 mutante: gitlab1001 - disabling puppet for debugging
* 17:48 razzi: shutting down kafka-jumbo1004 to allow dcops to upgrade NIC
* 17:05 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold - [[phab:T283076|T283076]]
* 17:46 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.16
* 17:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:41 brennen: train is currently unblocked; rolling to group0 ([[phab:T263182|T263182]])
* 16:50 mutante: gitlab2001 - temp stopped puppet - debugging gitlab restore script with Arnold
* 17:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:26 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/languages: Backport: [[gerrit:639491{{!}}language: Clean up $separatorTransformTable in km/la/my (T267091)]] (duration: 01m 12s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:21 brennen@deploy1001: Synchronized php-1.36.0-wmf.16/resources/Resources.php: Backport: [[gerrit:639495{{!}}mediawiki.action.edit.preview: Add versionCallback to improve startup perf (T266311)]] (duration: 01m 10s)
* 15:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=maps,service=kartotherian,name=maps2002.codfw.wmnet
* 14:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:14 hnowlan: rebuilding cassandra on maps2002
* 14:31 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:14 jayme: imported kubernetes 1.16.15 to component/kubernetes-future stretch-wikimedia
* 14:15 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:05 hnowlan: restarting maps2004 postgres for config change
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:57 razzi: shutting down kafka-jumbo1003 to allow dcops to upgrade NIC
* 13:30 elukey: start topic rebalancing for kafka main-eqiad (long maintenance, it will last a couple of days)
* 16:26 razzi: shutting down kafka-jumbo1002 to allow dcops to upgrade NIC
* 13:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:53 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 13:21 vgutierrez: updating acme-chief to version 0.34 on acmechief-test instances - [[phab:T292619|T292619]]
* 15:50 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 13:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 15:41 moritzm: installing junit4 security updates
* 13:14 vgutierrez: upload acme-chief 0.34 to apt.wikimedia.org (buster) - [[phab:T292619|T292619]]
* 14:55 elukey: shutdown kafka-jumbo1001 to swap NICs (1g -> 10g)
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 14:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2007.codfw.wmnet
* 14:10 jbond42: enable puppet fleet wide to post restart puppetdb
* 11:45 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:33 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2007.codfw.wmnet
* 13:57 jbond42: disable puppet fleet wide to restart puppetdb
* 11:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 jelto: jelto@gitlab1001:~$ sudo disable-puppet "disable puppet on gitlab1001 to test 728380 on GitLab replica - [[phab:T283076|T283076]]"
* 12:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 06:20 urbanecm: Start server-side upload for 1 video file
* 12:52 jbond42: upgrade freetype on jessie
* 02:14 ryankemper: [[phab:T288231|T288231]] `wdqs2006` data transfer complete and all tests passing on the host. All of `codfw wdqs-internal` is on the new streaming updater
* 12:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 12:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 00:07 brennen: end of UTC late backport & config training window
* 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:34 root@cumin1001: START - Cookbook sre.hosts.downtime
* 12:09 marostegui: Upgrade mysql on pc2010
* 11:58 jynus: shutting down db1139 in preparation of maintenance [[phab:T261405|T261405]]
* 11:55 marostegui: Upgrade mysql on db1077
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1012 to es1 master, es1011 to es2 master, es1014 to es3 (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13230 and previous config saved to /var/cache/conftool/dbconfig/20201105-114223-marostegui.json
* 11:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:05 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=dewiki; [[phab:T246539|T246539]])
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:55 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:16 godog: grafana-rw.wikimedia.org active and sso-enabled - [[phab:T262512|T262512]]
* 09:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13227 and previous config saved to /var/cache/conftool/dbconfig/20201105-094356-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13226 and previous config saved to /var/cache/conftool/dbconfig/20201105-094348-root.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13225 and previous config saved to /var/cache/conftool/dbconfig/20201105-094336-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13224 and previous config saved to /var/cache/conftool/dbconfig/20201105-092853-root.json
* 09:28 moritzm: enabling CAS on grafana1002, editing dashboards will be interrupted for a bit
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13223 and previous config saved to /var/cache/conftool/dbconfig/20201105-092845-root.json
* 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13222 and previous config saved to /var/cache/conftool/dbconfig/20201105-092833-root.json
* 09:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13219 and previous config saved to /var/cache/conftool/dbconfig/20201105-091350-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13218 and previous config saved to /var/cache/conftool/dbconfig/20201105-091341-root.json
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13217 and previous config saved to /var/cache/conftool/dbconfig/20201105-091329-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13216 and previous config saved to /var/cache/conftool/dbconfig/20201105-085846-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13215 and previous config saved to /var/cache/conftool/dbconfig/20201105-085838-root.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13214 and previous config saved to /var/cache/conftool/dbconfig/20201105-085826-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Slowly pool es1031 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13213 and previous config saved to /var/cache/conftool/dbconfig/20201105-084343-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: Slowly pool es1030 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13212 and previous config saved to /var/cache/conftool/dbconfig/20201105-084334-root.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: Slowly pool es1029 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13211 and previous config saved to /var/cache/conftool/dbconfig/20201105-084323-root.json
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13210 and previous config saved to /var/cache/conftool/dbconfig/20201105-084250-marostegui.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312', diff saved to https://phabricator.wikimedia.org/P13209 and previous config saved to /var/cache/conftool/dbconfig/20201105-083304-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13208 and previous config saved to /var/cache/conftool/dbconfig/20201105-083142-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13207 and previous config saved to /var/cache/conftool/dbconfig/20201105-081638-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13206 and previous config saved to /var/cache/conftool/dbconfig/20201105-080135-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1031 on es3 with minimium weight after being cloned from es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13205 and previous config saved to /var/cache/conftool/dbconfig/20201105-075625-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1030 on es2 with minimium weight after being cloned from es1013 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13204 and previous config saved to /var/cache/conftool/dbconfig/20201105-075507-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1029 on es1 with minimium weight after being cloned from es1016 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13203 and previous config saved to /var/cache/conftool/dbconfig/20201105-075358-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: After regenerating table stats', diff saved to https://phabricator.wikimedia.org/P13202 and previous config saved to /var/cache/conftool/dbconfig/20201105-074631-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T267216|T267216]]', diff saved to https://phabricator.wikimedia.org/P13201 and previous config saved to /var/cache/conftool/dbconfig/20201105-072352-marostegui.json
* 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 100%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13200 and previous config saved to /var/cache/conftool/dbconfig/20201105-071017-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 100%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13199 and previous config saved to /var/cache/conftool/dbconfig/20201105-070616-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 100%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13198 and previous config saved to /var/cache/conftool/dbconfig/20201105-070610-root.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 75%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13197 and previous config saved to /var/cache/conftool/dbconfig/20201105-065514-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 75%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13196 and previous config saved to /var/cache/conftool/dbconfig/20201105-065113-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 75%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13195 and previous config saved to /var/cache/conftool/dbconfig/20201105-065107-root.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 50%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13193 and previous config saved to /var/cache/conftool/dbconfig/20201105-064010-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 50%: After cloning es1030 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13192 and previous config saved to /var/cache/conftool/dbconfig/20201105-063610-root.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 50%: After cloning es1031 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13191 and previous config saved to /var/cache/conftool/dbconfig/20201105-063603-root.json
* 06:34 elukey: truncate application_1601916545561_129457's taskmanager.log (~600G) on an-worker1113 due to partition 'e' full
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es1016 (re)pooling @ 25%: After cloning es1029 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13190 and previous config saved to /var/cache/conftool/dbconfig/20201105-062507-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1013 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13189 and previous config saved to /var/cache/conftool/dbconfig/20201105-062454-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1017 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13188 and previous config saved to /var/cache/conftool/dbconfig/20201105-062446-root.json
* 01:57 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407] (duration: 00m 08s)
* 01:56 milimetric@deploy1001: Started deploy [analytics/refinery@6913407] (thin): Regular analytics weekly train THIN [analytics/refinery@6913407]
* 01:56 milimetric@deploy1001: Finished deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407] (duration: 08m 34s)
* 01:47 milimetric@deploy1001: Started deploy [analytics/refinery@6913407]: Regular analytics weekly train [analytics/refinery@6913407]


== 2020-11-04 ==
== 2021-10-14 ==
* 20:36 Urbanecm: Late B&C Morning window completed, deployment host is clear
* 23:59 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 20:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee0ba541fa55f6707276fdc5bd3f032cb9be3e60}}: Disable the search in header A/B test ([[phab:T265333|T265333]]) (duration: 01m 06s)
* 23:58 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 55s)
* 20:33 ejegg: updated payments-wiki from {{Gerrit|1ad4ba9639}} to {{Gerrit|388490e86d}}
* 23:56 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730737{{!}}Change Kashmiri Wikipedia logo (T293342)]] (duration: 00m 56s)
* 20:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NewcomerTask event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 01m 07s)
* 23:49 cjming@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 20:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|82579bf9d71bd3c9d97da0132ce8d92a8863da5b}}: Enable wgImagePreconnect on remaining wikis ([[phab:T123582|T123582]]) (duration: 01m 06s)
* 23:48 cjming@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 55s)
* 20:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2a57725f8f6fdaa3f40c834e84b43a0260077f2}}: Enable DiscussionTools as a beta feature on almost all Wikipedias ([[phab:T266303|T266303]]) (duration: 01m 07s)
* 23:46 cjming@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:730736{{!}}Change Kashmiri Wiktionary logo (T293373)]] (duration: 00m 56s)
* 20:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fb5c03262c20b5e99b3c2f6e91abb024f12da1f5}}: Enable wgCheckUserLogLogins at all wikis but loginwiki ([[phab:T253802|T253802]]) (duration: 01m 08s)
* 23:43 ejegg: updated payments-wiki from {{Gerrit|19d18c1852}} to {{Gerrit|0f48acea49}}
* 19:59 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.16 (duration: 62m 44s)
* 23:34 cjming@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/WikimediaEvents/includes/VectorPrefDiffInstrumentation.php: Backport: [[gerrit:730733{{!}}Change VectorPrefDiffInstrumentation stream name to `mediawiki.skin_diff` (T289622)]] (duration: 00m 56s)
* 18:57 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.16
* 23:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730936{{!}}allow sysops to add and remove users to other groups on ptwikivoyage (T292806)]] (duration: 00m 56s)
* 18:52 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.10 (duration: 27m 38s)
* 23:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 18:51 Urbanecm: Strip 2FA for Mark83 at SUL ([[phab:T267257|T267257]])
* 23:11 brennen@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730933{{!}}Add americanantiquarian.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T292918)]] (duration: 00m 57s)
* 18:20 elukey: restart memcached on mc1036 to pick up new settings (see https://gerrit.wikimedia.org/r/639099)
* 23:11 mutante: mw1452 - re-pooled, scap pull
* 18:15 hknust: holger@mwmaint1002 END - Run updateRestrictions.php
* 23:09 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:44 hknust: holger@mwmaint1002 START - Run updateRestrictions.php
* 22:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:35 ryankemper: [[phab:T288231|T288231]] Ran puppet on `wdqs2006`, now back to the cookbook run
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime
* 22:33 ryankemper: [[phab:T288231|T288231]] Forgot about running puppet-agent on `wdqs2006`; aborted cookbook run
* 17:15 zpapierski@deploy1001: Finished deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch (duration: 01m 15s)
* 22:33 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:13 zpapierski@deploy1001: Started deploy [wikimedia/discovery/analytics@8e8d2d4]: Deploying dc switch
* 22:33 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:07 effie: Reimage mc1036 for real this time
* 22:32 ryankemper: [[phab:T288231|T288231]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/730795; proceeding to data-transfer on `wdqs2006`: `sudo rm -fv /srv/wdqs/data_loaded` on `wdqs2006` followed by `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "streaming updater cutover for wdqs2005" --blazegraph_instance blazegraph --task-id [[phab:T288231|T288231]]`
* 16:40 brennen: 1.36.0-wmf.16 was branched at {{Gerrit|f51ccd2ccef8cba0e7d874b6f7cf4b73bcd36636}} for [[phab:T263182|T263182]]
* 22:31 mutante: depooling mw1452 for testig
* 16:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:28 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo pool`: transfer completed successfully; tests passing on host (used `ssh -L 9999:localhost:80 wdqs2005.codfw.wmnet` to establish tunnel)
* 16:10 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 22:23 dpifke@deploy1002: Finished deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]] (duration: 00m 05s)
* 15:39 effie: Reimage mc1036 to buster - [[phab:T252391|T252391]]
* 22:23 dpifke@deploy1002: Started deploy [performance/arc-lamp@84fe496]: New flamegraph.pl from upstream [[phab:T291898|T291898]]
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on all wikis - [[phab:T259163|T259163]] (duration: 00m 58s)
* 22:17 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:07 eileen: civicrm revision changed from {{Gerrit|018d3b19fe}} to {{Gerrit|9b5e0d015b}}, config revision is {{Gerrit|781d6a1b1f}}
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 21:34 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:25 robh@cumin1001: START - Cookbook sre.dns.netbox
* 15:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 21:10 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:09 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate ContentTranslationAbuseFilter event stream to EventGate on testwiki - [[phab:T259163|T259163]] (duration: 00m 59s)
* 21:06 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:37 jynus: restart mysql at db1133 [[phab:T266483|T266483]]
* 19:45 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:23 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:53 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:53 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=dagwiki --fix
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiktionary --logwiki=metawiki 'TURK FASTER' 'ARTHUR MORGAN'
* 14:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'George Dum Fulton' 'George Fulton' # [[phab:T293403|T293403]]
* 14:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:41 urbanecm: UTC evening B&C done
* 14:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/MediaSearch/extension.json: {{Gerrit|6da3523daaba85a4199721980c0a9c96b20697e7}}: Fix assessment quickview labels ([[phab:T292596|T292596]]) (duration: 01m 03s)
* 14:17 elukey: upload hue 4.8.0-1+deb10u1 to buster-wikimedia
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c8dffefd0d095abe3709dcc962d5d24f27b55869}}: Create Salima namespace for dagwiki ([[phab:T289911|T289911]]) (duration: 01m 04s)
* 14:15 jynus: restart mysqls at db209[789],db210[01], db2139, db2141 [[phab:T266483|T266483]]
* 18:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 14:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bccd4bc45498db8628567574d0bb3a23f8fb378}}: Add $wgSitename and $wgMetaNamespace for kswiki and kswiktionary ([[phab:T289752|T289752]], [[phab:T289767|T289767]]) (duration: 01m 04s)
* 14:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 13:59 jynus: restart mysqls at db1150 [[phab:T266483|T266483]]
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|262e588b44f126fb9e1aa933a3ca59b191b42bd7}}: Enable Growth mentor dashboard backend on all wikis ([[phab:T278920|T278920]]) (duration: 01m 05s)
* 13:54 jynus: restart mysqls at db1145 [[phab:T266483|T266483]]
* 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|41baa8c41d64510986f009b9be2d70dad0915f8c}}: Add new mediawiki.skin_diff event logging stream ([[phab:T289622|T289622]]) (duration: 01m 05s)
* 13:51 jynus: restart mysqls at db1140 [[phab:T266483|T266483]]
* 18:03 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:47 jynus: restart mysqls at db1139 [[phab:T266483|T266483]]
* 18:02 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:43 jynus: restart mysqls at db1116 [[phab:T266483|T266483]]
* 18:01 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:54 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:52 rzl: repooled mw1452 (with `sudo pool` so no auto log from conftool)
* 13:40 jynus: restart mysqls at db1102 [[phab:T266483|T266483]]
* 17:47 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:45 rzl@cumin1001: conftool action : set/pooled=no; selector: name=mw1452.eqiad.wmnet
* 13:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:42 rzl: depool mw1452 for training
* 13:35 jynus: restart mysqls at db1095 [[phab:T266483|T266483]]
* 17:32 addshore@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:31 addshore@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 13:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 17:29 addshore@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 13:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 13:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:44 ryankemper: [[phab:T288231|T288231]] Manually killed dangling `pigz` / `nc` processes on `wdqs2008` (and `wdqs2005` implicitly). Should be in the right state to re-start the `data-transfer` cookbook from again
* 12:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:41 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 12:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 elukey: drop kubeflow-kfserving* docker images from deneb
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:34 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:33 moritzm: installing node-ansi-regex security updates
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:28 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere (duration: 02m 24s)
* 12:50 Lucas_WMDE: EU backport&config done
* 16:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@4bff2d1]: Force mirrored traffic to 0% for everywhere
* 12:11 Urbanecm: Run scap pull at snapshot1010 manually
* 16:24 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 04s)
* 12:09 Urbanecm: scap-sync file returned `snapshot1010.eqiad.wmnet returned [255]: Host key verification failed.`
* 16:16 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad (duration: 02m 41s)
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ed3c43dc4488205663e6694b7ddfa991e3f3d4b9}}: Add www.irishstatutebook.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T267193|T267193]]) (duration: 01m 02s)
* 16:14 mbsantos@deploy1002: Started deploy [kartotherian/deploy@071f7c3]: Increase mirrored traffic to 100% for eqiad
* 11:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 11:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 11:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:07 ryankemper: [[phab:T288231|T288231]] About to ctrl+c out of ongoing data transfer because puppet run following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/730794 restarted blazegraph; we'll manually disable updater and kick off the transfer again
* 11:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo run-puppet-agent --force`
* 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:54 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2008:~$ sudo depool`
* 11:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:52 ryankemper: [[phab:T288231|T288231]] `ryankemper@wdqs2005:~$ sudo depool`
* 11:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:13 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 11:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/VisualEditor/includes/VisualEditorHooks.php: Backport: [[gerrit:730729{{!}}Fix value of 'namespacesWithSubpages' in wgVisualEditorConfig (T293310)]] (duration: 01m 04s)
* 11:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/CollectionHooks.php: Backport: [[gerrit:730580{{!}}Check that the timestamp  key/value is set to avoid undefined offset (T293300)]] (duration: 01m 03s)
* 11:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2026.codfw.wmnet to ganeti-test01.svc.codfw.wmnet
* 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 kormat: upgrading orchestrator.wm.o to 3.2.6-1 [[phab:T275784|T275784]]
* 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:49 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=apt
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:43 jbond: migrate apt.w.o to a dns active/passiev discovery address (cc moritzm)
* 11:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:23 moritzm: installing krb5 security updates on KDCs
* 11:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 2/2) (duration: 01m 03s)
* 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 urbanecm: Run extensions/GrowthExperiments/initWikiConfig.php for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:07 urbanecm: Create growthexperiments DB tables for ganwiki, iuwiki, tgwiki ([[phab:T291826|T291826]])
* 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13185 and previous config saved to /var/cache/conftool/dbconfig/20201104-102341-kormat.json
* 14:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint1002 (wiki=fiwiki; [[phab:T246539|T246539]])
* 14:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b35adfc59eec9c19b509bb9439cdfe33978a4f8b}}: Deploy Growth wikis to 4 wikis in dark mode ([[phab:T291826|T291826]]; 1/2) (duration: 01m 04s)
* 10:17 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13184 and previous config saved to /var/cache/conftool/dbconfig/20201104-101729-kormat.json
* 14:03 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|82d0a4bf45126ecba2cfcd1a0c2081a00f58dca3}}: Enable VE by default on 4 more wikis ([[phab:T290614|T290614]]) (duration: 01m 05s)
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:08 _joe_: restarting envoyproxy on all of restbase codfw, sending the command in parallel via cumin, to test poolcounter usage by the safe restart scripts
* 13:54 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:05 _joe_: restarting envoyproxy on restbase20<nowiki>{</nowiki>09,10<nowiki>}</nowiki> to test poolcounter usage by the safe restart scripts
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 09:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 09:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:14 kormat: uploaded orchestrator 3.2.6-1 packages to apt.wm.o (buster) [[phab:T275784|T275784]]
* 09:19 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2026.codfw.wmnet with OS buster
* 09:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:44 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 09:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on cloudbackup2002.codfw.wmnet with reason: working on cinder backupse
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:730746{{!}}Untangle “dispatch via jobs” settings in Wikibase.php (T291828)]] (no-op) (duration: 01m 04s)
* 08:44 moritzm: uploaded freetype 2.5.2+deb8u4+wmf1 to apt.wikimedia.org/jessie-wikimedia
* 12:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730725{{!}}Set wmgWikibaseDispatchViaJobsPruneChangesTableInJobEnabled for wikidatawiki (T291828)]] (no-op) (duration: 01m 05s)
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13182 and previous config saved to /var/cache/conftool/dbconfig/20201104-080033-root.json
* 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2026.codfw.wmnet with OS buster
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13181 and previous config saved to /var/cache/conftool/dbconfig/20201104-080024-root.json
* 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13180 and previous config saved to /var/cache/conftool/dbconfig/20201104-075953-root.json
* 11:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13179 and previous config saved to /var/cache/conftool/dbconfig/20201104-074530-root.json
* 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 75%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13178 and previous config saved to /var/cache/conftool/dbconfig/20201104-074520-root.json
* 11:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13177 and previous config saved to /var/cache/conftool/dbconfig/20201104-074449-root.json
* 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2002.codfw.wmnet
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13176 and previous config saved to /var/cache/conftool/dbconfig/20201104-073026-root.json
* 10:38 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2002.codfw.wmnet
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13175 and previous config saved to /var/cache/conftool/dbconfig/20201104-073017-root.json
* 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13174 and previous config saved to /var/cache/conftool/dbconfig/20201104-072946-root.json
* 10:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/: {{Gerrit|1f33fc3}}, {{Gerrit|e0ea1b8}}, {{Gerrit|cba2ac9}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 05s)
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13173 and previous config saved to /var/cache/conftool/dbconfig/20201104-071523-root.json
* 10:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|465b564}}, {{Gerrit|a8cc98b}}, {{Gerrit|6e95c48}}: GrowthExperiments backports ([[phab:T290609|T290609]]) (duration: 01m 06s)
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 25%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13172 and previous config saved to /var/cache/conftool/dbconfig/20201104-071513-root.json
* 10:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13171 and previous config saved to /var/cache/conftool/dbconfig/20201104-071443-root.json
* 09:20 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:09 elukey: manual cleanup of mcelog and its wmf-auto-restart (failing) on mw1381 (kernel 4.19, doesn't support mcelog)
* 09:20 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1016 es1013 es1017 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13170 and previous config saved to /var/cache/conftool/dbconfig/20201104-070121-marostegui.json
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:00 marostegui: Stop mysql on es1016, es1013, es1017 to clone es1029, es1030, es1031 [[phab:T261717|T261717]]
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Slowly pool es1028 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13169 and previous config saved to /var/cache/conftool/dbconfig/20201104-070020-root.json
* 09:19 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: Slowly pool es1027 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13168 and previous config saved to /var/cache/conftool/dbconfig/20201104-070010-root.json
* 09:19 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: Slowly pool es1026 after being recloned [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13167 and previous config saved to /var/cache/conftool/dbconfig/20201104-065939-root.json
* 09:18 volans@deploy1002: Finished deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1 (duration: 00m 50s)
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 100%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13166 and previous config saved to /var/cache/conftool/dbconfig/20201104-065926-root.json
* 09:17 volans@deploy1002: Started deploy [debmonitor/deploy@ab62ac5]: Release v0.3.1
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 100%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13165 and previous config saved to /var/cache/conftool/dbconfig/20201104-065905-root.json
* 09:04 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 45s)
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 100%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13164 and previous config saved to /var/cache/conftool/dbconfig/20201104-065849-root.json
* 09:03 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 06:52 elukey: force start of rasdaemon.service on dumpsdata1002 (its auto-restart unit was failing for it)
* 09:02 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.1 (duration: 00m 23s)
* 06:47 elukey: set an-presto1004's netbox status as "active" (was: failed) after hw maintenance - [[phab:T253438|T253438]]
* 09:02 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.1
* 06:44 elukey: force restart of uwsgi-ores on ores1005 - daemon down after reload, max client reached error messages in the logs
* 08:52 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 75%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13163 and previous config saved to /var/cache/conftool/dbconfig/20201104-064422-root.json
* 08:52 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 75%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13162 and previous config saved to /var/cache/conftool/dbconfig/20201104-064402-root.json
* 08:51 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 75%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13161 and previous config saved to /var/cache/conftool/dbconfig/20201104-064345-root.json
* 08:51 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1028 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13160 and previous config saved to /var/cache/conftool/dbconfig/20201104-063028-marostegui.json
* 08:22 volans: rolling out debmonitor-client upgrade to 0.3.1 across the fleet
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 50%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13159 and previous config saved to /var/cache/conftool/dbconfig/20201104-062919-root.json
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 50%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13158 and previous config saved to /var/cache/conftool/dbconfig/20201104-062858-root.json
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 50%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13157 and previous config saved to /var/cache/conftool/dbconfig/20201104-062842-root.json
* 07:25 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1027 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13156 and previous config saved to /var/cache/conftool/dbconfig/20201104-061829-marostegui.json
* 07:25 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool es1026 with minimum weight after recloning [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13155 and previous config saved to /var/cache/conftool/dbconfig/20201104-061549-marostegui.json
* 07:24 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'es1014 (re)pooling @ 25%: After cloning es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13154 and previous config saved to /var/cache/conftool/dbconfig/20201104-061416-root.json
* 07:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1012 (re)pooling @ 25%: After cloning es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13153 and previous config saved to /var/cache/conftool/dbconfig/20201104-061355-root.json
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1011 (re)pooling @ 25%: After cloning es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13152 and previous config saved to /var/cache/conftool/dbconfig/20201104-061339-root.json
* 07:18 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 07:17 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:37 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:52 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:50 foks: changing user email for "Region of Peel Archives"
* 01:41 ejegg: updated payments-wiki from {{Gerrit|b329d2dea2}} to {{Gerrit|19d18c1852}}
* 01:35 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 01:31 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2020-11-03 ==
== 2021-10-13 ==
* 22:56 _joe_: repooling mw1346
* 23:37 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:55 _joe_: depooling mw1346
* 23:36 eileen: civicrm revision changed from {{Gerrit|946dfb6c5a}} to {{Gerrit|018d3b19fe}}, config revision is {{Gerrit|85277466ed}}
* 22:49 cdanis: mw1342 restart-php7.2-fpm
* 23:36 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:730575{{!}}Create an alias for the project namespace on kswiki (T291740)]] (duration: 01m 05s)
* 22:37 cdanis: repool mw1278 and mw1279
* 22:30 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:35 cdanis: ✔️ cdanis@mw1290.eqiad.wmnet ~ 🕠🍺 sudo restart-php7.2-fpm
* 22:01 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Collection/includes/Specials/SpecialCollection.php: Backport: [[gerrit:730578{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 22:34 cdanis: restart-php7.2-fpm and pool on mw1276
* 21:50 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection: Backport: [[gerrit:730577{{!}}Api: Avoid trying to access undefined offset in a user's collection (T293261)]] (duration: 01m 04s)
* 22:31 cdanis: depool mw1276 and mw1279 also
* 21:47 foks: removing 8 files for legal compliance
* 22:25 cdanis: ✔️ cdanis@mw1278.eqiad.wmnet ~ 🕠🍺 sudo depool
* 21:03 foks: removing 2 files for legal compliance
* 21:16 hashar: Gerrit: triggering java garbage collection # [[phab:T263008|T263008]]
* 21:00 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:32 gehel: restarting blazegraph on wdqs1007 to reset ban list
* 20:50 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:49 brennen@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Collection/includes/Api/ApiGetBookCreatorBoxContent.php: Backport: [[gerrit:730574{{!}}Fall back to main page if given title is invalid (T293299)]] (duration: 01m 04s)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:46 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:45 cmjohnson1: shutting elastic1063 down to reseat DIMM [[phab:T265113|T265113]]
* 20:40 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 17:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:31 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:27 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS stretch
* 17:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 17:31 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:03 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kubernetes1021.eqiad.wmnet with OS stretch
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS stretch
* 17:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:18 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:16 mutante: gitlab2001 - status before was that "gitlab-ctl status" showed components "gitlab-workhorse" and "postgres-exporter" as "down". this was either pre-broken or caused by the restore process. after manually 'gitlab-ctl start gitlab-workhorse' all of the components are in "run" and https://gitlab-replica.wikimedia.org is up ( [[phab:T285867|T285867]])
* 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 19:08 mutante: gitl1b2001 - started workhorse which was for some reason marked as down after restore command ran
* 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:08 mutante: [gitlab2001:~] $ sudo /usr/bin/gitlab-ctl start gitlab-workhorse
* 16:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 01m 03s)
* 16:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 16:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 19:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87879865c35edab3ead523027681146e00d6fc02}}: Create Translation namespace for viwikisource ([[phab:T290691|T290691]]) (duration: 01m 04s)
* 16:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|06fd0f225575448771cdba0d4e6bf36bb6715bc1}}: add extendedconfimed for autoreview group on ptwiki ([[phab:T292912|T292912]]) (duration: 01m 04s)
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:37 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript initSiteStats.php --wiki=ptwiki --update
* 16:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=ptwiki extendedconfirmed
* 16:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0bb2b388217aa91a39ed3684f87fdf7edb06fd81}}:  Set autoconfirmedextended and confirmedextended for ptwiki ([[phab:T292915|T292915]]) (duration: 01m 04s)
* 16:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:16 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|694bc234ab5dbb9a2387a6129998d45a53ac0ab3}}: Remove an old dawiki temporary logo (duration: 01m 04s)
* 16:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|224e2a374b1cc6327e9d8c2bca576091ce4efc74}}: Add NS_MAIN back to wgExtraSignatureNamespaces for mediawikiwiki ([[phab:T291630|T291630]]) (duration: 01m 05s)
* 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:13 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|1b96f54a518620b0dc6a0ab63b402d0ea2c6bf70}}: Update logo for liwiktionary ([[phab:T291479|T291479]]) (duration: 01m 14s)
* 16:13 cdanis@cumin1001: START - Cookbook sre.network.cf
* 18:10 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:04 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:10 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:03 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:09 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:09 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 16:01 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:08 volans: uploaded debmonitor-client_0.3.1 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
* 15:59 elukey: shutdown kafka-jumbo1006 to replace 1G with 10G nic
* 17:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|dd7a3314602ffddc5b917cccc71c917301639388}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 04s)
* 15:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:13 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/maintenance/initWikiConfig.php: {{Gerrit|5c27154cf434bebc37f5e98e2ad1b5cea7cde1d4}}: initWikiConfig: Fix loading difficulty/group from SUGGESTED_EDITS_TASK_TYPES ([[phab:T293219|T293219]]) (duration: 01m 15s)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:57 mutante: stat1008 - short on disk space, mostly used in /tmp, high CPU usage by R proccess, sent a message about it to all shell users via wall
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:50 mutante: stat1008 - apt-get clean - freed 1.3 GB disk space - was alerting in Icinga because / was 97% full
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 16:37 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 15:08 moritzm: imported php-redis/xdebug to component/php72 for buster-wikimedia
* 16:37 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:37 moritzm: imported php-apcu-bc/php-igbinary/tideways-xhprof to component/php72 for buster-wikimedia
* 16:23 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:23 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:29 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:28 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 volans@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 volans@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 14:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 13:53 moritzm: imported php-mongodb/php-wmerrors/wikidiff2 to component/php72 for buster-wikimedia
* 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 13:43 sobanski: Removing db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 15:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 13:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:33 lsobanski@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:24 lsobanski@cumin1001: START - Cookbook sre.hosts.decommission
* 15:04 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:22 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:03 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 13:22 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:03 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:01 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 12:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:01 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 12:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:58 moritzm: imported php-apcu/php-geoip/php-imagick/php-mailparse to component/php72 for buster-wikimedia
* 14:59 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:57 moritzm: running "reprepro clearvanished" to prune thirdparty/orchestrator
* 14:59 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:51 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 03s)
* 14:57 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:51 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:56 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:56 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:23 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 11:23 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 14:54 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 11:23 hnowlan: resyncing postgres replica maps1001
* 14:52 ema: repool cp4021, further testing can be performed on sretest1001 [[phab:T201317|T201317]]
* 11:03 Amir1: rolling restart of ores
* 14:51 volans: restarting ircecho.service on alert1001 to get back icinga-wm without the underscore
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 elukey: restart pybal on lvs1015 (low-traffic primary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:48 moritzm: reverted to clean package state on deneb
* 10:45 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 07s)
* 14:44 elukey@puppetmaster1001: conftool action : ge; selector: cluster=ml_serve,service=inference
* 10:45 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 14:36 elukey: restart pybal on lvs1016 (low-traffic secondary) to pick up new config for inference.discovery.wmnet - [[phab:T289835|T289835]]
* 10:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:27 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:27 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:22 gilles@deploy1001: Finished deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]] (duration: 00m 26s)
* 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:21 gilles@deploy1001: Started deploy [performance/asoranking@2a2cb05]: [[phab:T266985|T266985]]
* 14:25 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:16 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 02m 15s)
* 14:21 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:14 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 14:21 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:13 elukey@deploy1001: Finished deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided) (duration: 01m 45s)
* 14:20 moritzm: temporarily downgrade sphinx packages on deneb to 1.7.9-1~bpo9+1 to build a Ganeti 2.16 stretch backport with delicate toolchain needs
* 10:11 elukey@deploy1001: Started deploy [analytics/refinery@cf5db74] (hadoop-test): (no justification provided)
* 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 10:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 10:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:57 kormat: uploaded orchestrator 3.2.3-2 to apt
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:10 jbond@cumin1001: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
* 09:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 jbond@cumin1001: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: sretest1001.eqiad.wmnet
* 09:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 XioNoX: push prep-work for anycast tuning in ulsfo - [[phab:T288843|T288843]]
* 09:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:38 jayme: imported helm-diff_3.1.3-2 to buster-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/helm-diff/+/730509)
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:37 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 13:34 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:13 Lucas_WMDE: UTC morning backport+config window done
* 09:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/includes: Backport: [[gerrit:730370{{!}}Add Link: Do not log "no suggestion found" errors in production log (T291251)]] (duration: 01m 04s)
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:11 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # after applying 730512 at mwmaint1002 to workaround [[phab:T293219|T293219]] # [[phab:T255037|T255037]]
* 09:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 12:11 kharlan@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/modules: Backport: [[gerrit:730371{{!}}Suggested Edits: Update local config.presets when topics/difficulty presets change (T292536)]] (duration: 01m 07s)
* 09:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13139 and previous config saved to /var/cache/conftool/dbconfig/20201103-090523-kormat.json
* 11:56 urbanecm@deploy1002: Synchronized wmf-config/config/itwiki.yaml: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]) (duration: 01m 04s)
* 09:00 kormat@cumin1001: dbctl commit (dc=all): 'Rebooting for [[phab:T261389|T261389]]', diff saved to https://phabricator.wikimedia.org/P13138 and previous config saved to /var/cache/conftool/dbconfig/20201103-090013-kormat.json
* 11:55 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=mediawikiwiki "Growth/Communities/How to introduce yourself as a mentor" "Growth/Communities/How to configure the mentors' list" "Martin Urbanec (WMF)" --reason '[[:phab:T293184]]' # [[phab:T293184|T293184]]
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:55 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 2/3) (duration: 01m 04s)
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|38a019d4fd6ff8e7cf92f5e7c6a899c336f20235}}: itwiki: Deploy Growth features in dark mode ([[phab:T255037|T255037]]; 1/3) (duration: 01m 05s)
* 08:32 godog: Prometheus re-enable compactions - [[phab:T261281|T261281]]
* 11:50 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=itwiki --phab='[[phab:T255037|T255037]]' # [[phab:T255037|T255037]]
* 06:59 marostegui: Remove db1091 from tendril and zarcillo [[phab:T267088|T267088]]
* 11:49 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=itwiki growthexperiments # [[phab:T255037|T255037]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1091 from dbctl [[phab:T267088|T267088]]', diff saved to https://phabricator.wikimedia.org/P13137 and previous config saved to /var/cache/conftool/dbconfig/20201103-065756-marostegui.json
* 11:48 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/Wikibase/repo/: Backport: [[gerrit:730380{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 07s)
* 06:46 marostegui: Deploy schema change on s1 codfw master: [[phab:T265349|T265349]]
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Wikibase/repo/: Backport: [[gerrit:730385{{!}}Instantiate ItemId for SiteLinkConflictLookup results (T293104)]] (duration: 01m 18s)
* 06:16 marostegui: Stop MySQL on es1014 to clone es1028 [[phab:T261717|T261717]]
* 11:33 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1014 to reclone es1028 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13136 and previous config saved to /var/cache/conftool/dbconfig/20201103-061423-marostegui.json
* 11:19 ema: pool cp4021 after reimage [[phab:T201317|T201317]]
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1019 to es3 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13135 and previous config saved to /var/cache/conftool/dbconfig/20201103-061403-marostegui.json
* 11:05 ema@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4021.ulsfo.wmnet with OS buster
* 06:11 marostegui: Stop MySQL on es1012 to clone es1027 [[phab:T261717|T261717]]
* 10:15 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1012 to reclone es1027 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13134 and previous config saved to /var/cache/conftool/dbconfig/20201103-060727-marostegui.json
* 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1018 to es1 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13133 and previous config saved to /var/cache/conftool/dbconfig/20201103-060705-marostegui.json
* 10:09 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:728490{{!}}Add more types of QuickSurveys on beta cluster (T292459)]] (duration: 01m 53s)
* 06:04 marostegui: Stop MySQL on es1011 to clone es1026 [[phab:T261717|T261717]]
* 10:06 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1011 to reclone es1026 [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13132 and previous config saved to /var/cache/conftool/dbconfig/20201103-060054-marostegui.json
* 09:22 ema@cumin2002: START - Cookbook sre.hosts.reimage for host cp4021.ulsfo.wmnet with OS buster
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1015 to es2 master (this is a noop) [[phab:T261717|T261717]]', diff saved to https://phabricator.wikimedia.org/P13131 and previous config saved to /var/cache/conftool/dbconfig/20201103-060038-marostegui.json
* 08:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:39 cstone: civicrm revision changed from {{Gerrit|cd13d9e30f}} to {{Gerrit|b1342c4129}}
* 08:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:13 shdubsh: restart ES on logstash1009 - oom killed
* 08:21 elukey: run kafka preferred-replica-election on kafka-main1001 to rebalance partition leaders - [[phab:T288825|T288825]]
* 01:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:15 godog: bounce graphite on graphite1004 to apply new config
* 00:59 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 07:33 elukey: increase kafka topic partition size of the top 4 high traffic topics of main-eqiad as described in https://phabricator.wikimedia.org/T288825#7422726
* 00:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:13 XioNoX: provision new eqsin-ulsfo link - [[phab:T273308|T273308]]
* 00:40 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 elukey: `kafka topics --alter --topic <nowiki>{</nowiki>eqiad,codfw<nowiki>}</nowiki>.change-prop.transcludes.resource-change --partitions 3` on kafka-main2001 - [[phab:T288825|T288825]]
* 00:38 ejegg: updated payments-wiki from {{Gerrit|030b11da1a}} to {{Gerrit|b329d2dea2}}


== 2020-11-02 ==
== 2021-10-12 ==
* 22:19 twentyafterfour: restart php7.3-fpm on phab1001
* 23:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:03 twentyafterfour: applied {{Gerrit|113a244a66}} on phab1001 to hotfix [[phab:T240862|T240862]]
* 23:16 urbanecm: UTC late B&C window done
* 20:22 eileen: process-control config revision is {{Gerrit|313a36312f}} re-enable thank you
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 01m 04s)
* 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:12 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|59c31d9046a68e73b07d8179ac569425d18dcf73}}: Change logo in astwiki ([[phab:T292742|T292742]]) (duration: 02m 09s)
* 19:48 dzahn@cumin1001: START - Cookbook sre.dns.netbox
* 23:05 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:47 eileen: civicrm revision changed from {{Gerrit|3317d30356}} to {{Gerrit|cd13d9e30f}}, config revision is {{Gerrit|db912e3bba}}
* 22:53 urbanecm: [urbanecm@labweb1001 ~]$ mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=labswiki Jamesmontalvo3 #
* 19:45 eileen: process-control config revision is {{Gerrit|db912e3bba}} - thankyou job off for testing
* 22:51 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 19:07 Urbanecm: Deployed security fix for [[phab:T205908|T205908]]
* 20:21 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:31 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.1 (duration: 04m 02s)
* 18:59 andrewbogott: added dcaro to ops and wmf ldap groups
* 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:59 mutante: decom'ing testvm1001
* 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:58 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:47 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]] (duration: 45m 36s)
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
* 18:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:01 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.4  refs [[phab:T281168|T281168]]
* 18:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:56 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/CentralNotice: Backport: [[gerrit:730141]] (duration: 00m 59s)
* 18:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:14 XioNoX: push new pfw policies - [[phab:T267051|T267051]]
* 17:46 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
* 16:39 ejegg: updated payments-wiki from {{Gerrit|adc3369cb3}} to {{Gerrit|1ad4ba9639}}
* 17:43 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:32 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SyntaxHighlight_GeSHi/includes/ResourceLoaderPygmentsModule.php: Backport: [[gerrit:730233{{!}}Include generated styles before Mediawiki overrides (T292736)]] (duration: 00m 57s)
* 16:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:36 moritzm: imported php-excimer/php-luasandbox to component/php72 for buster-wikimedia
* 17:23 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730236{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 14:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:34 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 17:16 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/includes/actions/pagers/HistoryPager.php: Backport: [[gerrit:730235{{!}}Fix history page iteration in backwards mode (T292791)]] (duration: 00m 57s)
* 14:17 kormat: uploaded orchestrator 3.2.3-1 to apt
* 17:12 moritzm: installing rsync bugfix updates
* 14:01 hashar@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove $wgExtDistListFile, unused - [[phab:T266024|T266024]] (duration: 00m 58s)
* 17:09 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:46 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 16:56 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:40 elukey: roll restart zookeeper ok an-conf* to pick up new openjdk upgrades
* 16:55 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 13:40 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 16:53 moritzm: failed over ganeti master for test cluster to ganeti2025
* 13:03 Lucas_WMDE: EU backport&config window done
* 16:50 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 13:02 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/Wikibase: Backport: [[gerrit:637801{{!}}Revert JS parser commits (T266671)]] (duration: 01m 09s)
* 16:48 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637819{{!}}Add Response namespace at otrs_wikiwiki to namespaces searched by default (T266917)]] (duration: 00m 58s)
* 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:21 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 2/2 (Beta) (duration: 00m 57s)
* 16:30 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts testvm2009.codfw.wmnet
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:634224{{!}}Stop defining wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]], 1/2 (production) (duration: 01m 02s)
* 16:30 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:638020{{!}}Stop reading wmgULSCompactLinksForNewAccounts and wmgULSCompactLinksEnableAnon]] (duration: 00m 58s)
* 16:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:15 volans: upgraded python3-wmflib to 0.0.4 on cumin[12]001
* 16:26 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 12:09 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]], Beta part (prod no-op) (duration: 00m 58s)
* 16:26 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/includes: Backport: [[gerrit:730226{{!}}Pre-format comments for non-local files too (T292570)]] (duration: 01m 15s)
* 12:07 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:637778{{!}}Fix array depth for properties array (T266835)]] (duration: 00m 59s)
* 16:17 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 12:02 volans: uploaded python3-wmflib_0.0.4 to apt.wikimedia.org buster-wikimedia
* 16:16 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2009.codfw.wmnet
* 11:51 effie: disable puppet on  thumbor1001 and thumbor1002 to test 636024
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:51 effie: disable thumbor on thumbor1001 and thumbor1002 to test 636024
* 16:10 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 11:34 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 16:09 volans@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2009.codfw.wmnet
* 11:33 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:638045{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:06 dancy@deploy1002: Synchronized php-1.38.0-wmf.4/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730231{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 00m 57s)
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:00 volans@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2009.codfw.wmnet
* 11:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 godog: upgrade thanos to 0.16.0 on prometheus hosts - [[phab:T261281|T261281]]
* 15:58 dancy@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/SecurePoll/includes/Hooks/HookRunner.php: Backport: [[gerrit:730230{{!}}Fix wrong var being passed (T289950 T293102)]] (duration: 02m 13s)
* 10:59 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 15:57 volans@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2009.codfw.wmnet
* 10:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.decommission
* 15:51 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:49 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 15:48 volans@cumin2002: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host testvm2009.codfw.wmnet
* 10:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 15:48 volans@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2009.codfw.wmnet
* 10:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 15:41 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for analytics1069.eqiad.wmnet
* 10:23 moritzm: installing openldap security updates on corp LDAP replicas
* 15:41 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for analytics1069.eqiad.wmnet
* 08:46 XioNoX: add uRPF strict to ulsfo office links - [[phab:T266561|T266561]]
* 15:02 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:41 moritzm: installing openldap security updates on LDAP replicas
* 14:50 volans@cumin2002: START - Cookbook sre.dns.netbox
* 08:40 godog: upgrade thanos to 0.16 in codfw/eqiad - [[phab:T261281|T261281]]
* 13:49 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2006.codfw.wmnet
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 13:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
* 06:09 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 oblivian@cumin1001: START - Cookbook sre.network.cf
* 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:14 godog: add 50G to prometheus/k8s in eqiad
* 13:13 otto@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable x_client_ip_forwarding_enabled for eventgate-analytics and eventgate-analytics-external - [[phab:T288853|T288853]] (duration: 00m 56s)
* 13:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:11 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on analytics1069.eqiad.wmnet with reason: draining flea power [[phab:T291732|T291732]]
* 13:05 volans: upgraed spicerack to 1.0.5 on cumin hosts
* 12:25 volans: uploaded spicerack_1.0.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 12:15 elukey: `kafka topics --alter --topic codfw.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:15 elukey: `kafka topics --alter --topic eqiad.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:10 elukey: `kafka topics --alter --topic codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 12:09 elukey: `kafka topics --alter --topic eqiad.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite --partitions 5` - [[phab:T288825|T288825]]
* 11:58 elukey: `kafka topics --alter --topic codfw.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 11:49 elukey: `kafka topics --alter --topic eqiad.resource-purge --partitions 5` on kafka-main2001 - [[phab:T288825|T288825]]
* 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:34 urbanecm: UTC morning B&C window done
* 11:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|860ea0944d6dc1e6b5061eb84eec378eb5ac8441}}: Remove NS_MAIN from wgExtraSignatureNamespaces on most special wikis ([[phab:T291630|T291630]]) (duration: 00m 57s)
* 11:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 urbanecm@deploy1002: Synchronized w/static.php: {{Gerrit|e77ae17efb34723598fc69e87109944384df442a}}: static.php: correctly report a bad request (duration: 00m 57s)
* 11:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2003.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 10:30 ema: apply https://gerrit.wikimedia.org/r/726912 to all A:cp nodes [[phab:T288106|T288106]]
* 10:24 ema@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:23 ema: depool/repool ats-be on cp4028 to verify updates to /etc/varnish/directors.frontend.vcl on cp4027 keep on working fine [[phab:T288106|T288106]]
* 10:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:22 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4028.ulsfo.wmnet,service=ats-be
* 10:16 ema: cp4027: enable and run puppet to test https://gerrit.wikimedia.org/r/726912 [[phab:T288106|T288106]]
* 10:12 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2025.codfw.wmnet with OS buster
* 09:16 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17456 and previous config saved to /var/cache/conftool/dbconfig/20211012-091614-kormat.json
* 09:01 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17455 and previous config saved to /var/cache/conftool/dbconfig/20211012-090111-kormat.json
* 08:46 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17454 and previous config saved to /var/cache/conftool/dbconfig/20211012-084607-kormat.json
* 08:31 kormat@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: repool db1127 [[phab:T292956|T292956]]', diff saved to https://phabricator.wikimedia.org/P17453 and previous config saved to /var/cache/conftool/dbconfig/20211012-083103-kormat.json
* 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:58 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|17dc3aa}}, {{Gerrit|e0ca905}}, {{Gerrit|c0f4f4e}}: GrowthExperiments backports ([[phab:T292224|T292224]], [[phab:T290609|T290609]], [[phab:T290609|T290609]]) (duration: 00m 59s)
* 07:40 elukey: run kafka preferred-replica-election on kafka-main2001 to rebalance partition leaders after the last topic moves - [[phab:T288825|T288825]]
* 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS buster
* 07:22 moritzm: installing RT security updates
* 04:43 eileen: civicrm revision changed from {{Gerrit|96090e4bd2}} to {{Gerrit|946dfb6c5a}}, config revision is {{Gerrit|85277466ed}}
* 03:56 kart_: cxserver: Remove Matxin Key from Production ([[phab:T292635|T292635]])
* 03:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:48 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:45 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:11 eileen: civicrm revision changed from {{Gerrit|598b59b0ee}} to {{Gerrit|96090e4bd2}}, config revision is {{Gerrit|85277466ed}}


== 2020-11-01 ==
== 2021-10-11 ==
* 22:41 Urbanecm: mwscript extensions/OATHAuth/maintenance/disableOATHAuthForUser.php --wiki=metawiki Turkmen # [[phab:T266976|T266976]]
* 21:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 09:52 ariel@deploy1001: Finished deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run (duration: 00m 04s)
* 20:58 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 09:52 ariel@deploy1001: Started deploy [dumps/dumps@de4c823]: actually allow per run dir to be made early in the run
* 17:08 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 09:16 ariel@deploy1001: Finished deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed (duration: 00m 04s)
* 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 09:16 ariel@deploy1001: Started deploy [dumps/dumps@6c7d811]: create empty dir for tableinfo if needed
* 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 01:26 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:31 jgleeson: smashpig updated from {{Gerrit|3607b16f83}} to {{Gerrit|dd3a81c7c2}}
* 01:26 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 01:16 rzl@cumin1001: dbctl commit (dc=all): 'Depool db1091', diff saved to https://phabricator.wikimedia.org/P13124 and previous config saved to /var/cache/conftool/dbconfig/20201101-011600-rzl.json
* 14:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm[2001-2002,2005].codfw.wmnet with reason: Ganeti tests
* 14:36 Emperor: start restoring weight to ms-be2045 [[phab:T290881|T290881]]
* 13:42 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 12:53 moritzm: install apache security updates on buster
* 12:49 topranks: Setting up BGP peering to AS12552 (GlobalConnect Group) at AMS-IX on cr2-esams
* 12:45 ema: cp4027: upgrade varnish to 6.0.8 [[phab:T292290|T292290]]
* 12:04 moritzm: install apache security updates on bullseye
* 10:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 09:50 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 09:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2003.codfw.wmnet
* 09:37 elukey: force kafka preferred-replica-election on kafka-main2001 after another batch of topic partitions moves - [[phab:T288825|T288825]]
* 09:13 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host graphite2003.codfw.wmnet
* 09:09 elukey: force kafka preferred-replica-election on kafka-main2001 after the first 50 topic partitions moves - [[phab:T288825|T288825]]
* 09:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 09:01 godog: bounce swift-object-replicator on ms-be2036
* 08:52 godog: bounce statsite on graphite1004 to apply unit config changes
* 08:48 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 08:41 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet
* 08:38 moritzm: updated buster d-i image for Bullseye 11.1 point release [[phab:T292844|T292844]]
* 08:38 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 08:26 godog: swift eqiad-prod: final weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 08:25 moritzm: updated buster d-i image for Buster 10.11 point release [[phab:T292838|T292838]]
* 08:24 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet
* 08:06 godog: bounce uwsgi on graphite hosts to bump request size limit - [[phab:T292877|T292877]]
* 07:58 volans: migrating physical hosts DHCP to the new reimage process - [[phab:T269855|T269855]]
* 07:57 elukey: start kafka topics rebalancing for main-codfw (long running maintenance) - [[phab:T288825|T288825]]


== 2020-10-31 ==
== 2021-10-09 ==
* 00:12 mutante: removed Nuria from wmf group, she is already in nda group ([[phab:T266086|T266086]])
* 05:01 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:28 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:32 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 00:46 mutante: ms-be2045 - started systemd-timedated which had been killed by something
* 00:28 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 00:24 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.force-unfreeze (exit_code=99)
* 00:23 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-unfreeze
* 00:13 ryankemper: [[phab:T292814|T292814]] Write queue stuck at 133 events in partition 1 of topic `codfw.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite`, will try again at another time
* 00:12 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]


== 2020-10-30 ==
== 2021-10-08 ==
* 23:35 foks: removing two files for legal compliance
* 23:16 legoktm: sudo cumin -b 10 C:mediawiki::packages 'apt-get purge lilypond-data -y'
* 23:32 mutante: adding query.wikidata.org to TLS cert for webserver-misc-apps.discovery.wmnet [[phab:T266702|T266702]]
* 23:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 23:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:38 mutante: mwmaint2002 - disable-puppet, stop bacula-fd, recovery in progress
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:34 mutante: disabling puppet on bacula - going through a restore https://wikitech.wikimedia.org/wiki/Bacula#Restore_from_a_non-existent_host_(missing_private_key)
* 23:04 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:30 legoktm: running puppet across C:mediawiki::packages to uninstall lilypond and ploticus: legoktm@cumin1001:~$ sudo cumin -b 4 C:mediawiki::packages 'run-puppet-agent'
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 23:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: REIMAGE
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 20:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: REIMAGE
* 23:02 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: REIMAGE
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: REIMAGE
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 19:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 21:02 jiji@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: REIMAGE
* 21:00 jiji@cumin2001: START - Cookbook sre.hosts.downtime
* 18:15 cstone: civicrm revision changed from {{Gerrit|5cb7d487cb}} to {{Gerrit|598b59b0ee}}
* 20:59 mutante: mw1267,mw1268 - scap pull and repool - back to prod - [[phab:T266164|T266164]]
* 16:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=enwiki --force # to measure performance on a large wiki
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1267.eqiad.wmnet
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:56 mutante: mw1267,mw1268 - scap pull
* 15:29 jelto: enable puppet on gitlab1001 again for [[phab:T283076|T283076]]
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:05 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:01 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:49 Amir1: wikiadmin@10.64.16.85(wikidatawiki)> delete from wb_changes_subscription where cs_subscriber_id in ('testcommonswiki', 'mowiki');
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:39 Emperor: installing stress on ms-be2045 given recent h/w issues [[phab:T290881|T290881]]
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force
* 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:43 Emperor: reboot ms-be2045 [[phab:T290881|T290881]]
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 07:41 gehel: manually resuming the data reloads on wdqs1009 and wdqs2008
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 06:42 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 06:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 06:28 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 05:35 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 04:56 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 20:31 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 04:32 ryankemper: [[phab:T292814|T292814]] Beginning rolling restart of `cloudelastic`: `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic restart" --nodes-per-run 1 --start-datetime 2021-10-08T03:53:49 --task-id [[phab:T292814|T292814]]` on `ryankemper@cumin1001` tmux `elastic`
* 20:06 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:31 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 20:04 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 04:29 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 18:48 cdanis: the above scap began (and mostly finished) several minutes ago but is hanging on a couple hosts down for maintenance
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across both test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 18:48 cdanis@deploy1001: Synchronized wmf-config/InitialiseSettings.php: lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]] (duration: 05m 14s)
* 04:28 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 18:48 cdanis: ✔️ cdanis@deploy1001.eqiad.wmnet /srv/mediawiki-staging 🕝☕ scap sync-file wmf-config/InitialiseSettings.php 'lower frwiki featured feeds limit {{Gerrit|1a41ef634}} [[phab:T266865|T266865]]'
* 04:23 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@8f57a56]: 0.3.89 (duration: 08m 22s)
* 18:27 hashar@deploy1001: Finished deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index (duration: 00m 06s)
* 04:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 18:27 hashar@deploy1001: Started deploy [integration/docroot@c35e5e9]: Add ECS to doc.wikimedia.org index
* 04:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:18 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 04:17 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 17:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.89` on canary `wdqs1003`; proceeding to rest of fleet
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 04:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@8f57a56]: 0.3.89
* 17:19 effie: disable puppet on mc1036 and mc2036 - [[phab:T252391|T252391]]
* 04:14 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.89`. Pre-deploy tests passing on canary `wdqs1003`
* 17:18 effie: enable puppet on all mediawiki and mc* hosts
* 03:58 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 16:19 elukey: kafka-jumbo1006 still running with 1g nick
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic restart - ryankemper@cumin1001 - [[phab:T292814|T292814]]
* 15:36 effie: stopping puppet on mediawiki and mc* hosts
* 02:04 Krinkle: krinkle@deploy1002$ echo 'https://en.wikipedia.org/static/images/project-logos/jvwiktionary.png' {{!}} mwscript purgeList.php , ref [[phab:T287425|T287425]], [[phab:T292810|T292810]]
* 15:11 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:07 tgr_: deploy window over
* 15:11 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 00:05 tgr@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:727498{{!}}Mentee overview: Make UncachedMenteeOverviewDataProvider::getBlocksForUsers faster (T290609)]] (duration: 00m 56s)
* 15:09 rzl: downtiming mc2036 for buster reimage
* 14:42 elukey: stop kafka-jumbo1006 to swap NICs (1g -> 10g, d1 -> d4 rack)
* 14:14 cmjohnson1: moving mw1267 and mw168 to rack A8 eqiad [[phab:T266164|T266164]]
* 12:29 XioNoX: set normal VRRP balancing on cr2-eqiad
* 10:08 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:08 klausman@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 ladsgroup@deploy1001: Synchronized static/images/project-logos: Revert: Changing logo of Wikidata for the brithday (duration: 01m 12s)
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:07 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:54 elukey: decom an-tool1006 (old analytics test vm) - [[phab:T255139|T255139]]
* 08:53 elukey@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-10-29 ==
== 2021-10-07 ==
* 23:59 eileen: process-control config revision is {{Gerrit|6891d35bce}}
* 23:43 thcipriani@deploy1002: Synchronized wmf-config/logos.php: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 3/3 (duration: 00m 55s)
* 23:39 Urbanecm: Evening B&C window done
* 23:41 thcipriani@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 2/3 (duration: 00m 55s)
* 23:38 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikiquote --add-prefix=BROKEN --fix # [[phab:T266605|T266605]] # P13112
* 23:40 thcipriani@deploy1002: Synchronized static/images/project-logos: Config: [[gerrit:708065{{!}}Change Javanese Wiktionary logo (T287425)]] part 1/3 (duration: 00m 56s)
* 23:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddb7e08e9c1d07f704c9f7585d8b6089f1895b5c}}: Add namespace aliases to Turkish Wikiquote ([[phab:T266605|T266605]]) (duration: 00m 57s)
* 23:30 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 2/2 (duration: 00m 56s)
* 23:36 eileen: process-control config revision is {{Gerrit|1114512f90}}
* 23:28 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikiquote-wordmark-tr.svg: Config: [[gerrit:704170{{!}}Adding and use wordmark in trwikiquote (T286133)]] Part 1/2 (duration: 00m 57s)
* 23:29 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikisource --add-prefix=BROKEN --fix # [[phab:T266606|T266606]] # P13111
* 21:35 urbanecm: Password reset for SUL User:LA2-bot ([[phab:T292793|T292793]])
* 23:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c3a8555154673c4c5a65f6ec2a1219d0832f48e0}}: Add namespace aliases to Turkish Wikisource ([[phab:T266606|T266606]]) (duration: 00m 56s)
* 20:43 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3
* 23:23 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwikibooks --fix # [[phab:T266608|T266608]]
* 20:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281167|T281167]]
* 23:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1800d11ec8c07ff6ccffe0fd03ce11e6786f8a6e}}: Add namespace aliases to Turkish Wikibooks ([[phab:T266608|T266608]]) (duration: 00m 57s)
* 20:35 cmooney@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 23:22 eileen: civicrm revision changed from {{Gerrit|e1d65b0f3a}} to {{Gerrit|3317d30356}}, config revision is {{Gerrit|d70fe02cb9}}
* 20:35 cmooney@cumin1001: START - Cookbook sre.network.cf
* 23:18 Urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=trwiktionary --fix    # [[phab:T266609|T266609]]
* 20:23 krinkle@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Gadgets/: {{Gerrit|I7c858b8c4bc}} (duration: 00m 56s)
* 23:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|090f75730727e7a3ca5a85af0ff9071213dd047f}}: Add namespace aliases to Turkish Wiktionary ([[phab:T266609|T266609]]) (duration: 00m 58s)
* 20:01 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Echo/: {{Gerrit|8a7ff05ba28f302adb581bf430a868bb815b4ffd}}: Revert "Use namespaced CentralAuthSessionProvider" (duration: 00m 57s)
* 22:35 mutante: mw1268 - depooled for [[phab:T266164|T266164]]
* 19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/CentralAuth/: {{Gerrit|c01c2e4983bad8582ddd62aeb35ac9be852d493b}}: Revert "Namespace session providers" (duration: 00m 57s)
* 22:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 19:44 urbanecm: Backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/727489, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Echo/+/727487 in an unsafe way -- exceptions at testwikis expected, wmf.3 is not deployed elsewhere, so this should be ok
* 22:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:37 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert all wikis to 1.38.0-wmf.2 ([[phab:T281167|T281167]])
* 22:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:33 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): variously blocked, rolling back to testwikis for safe deploy of backports
* 22:32 mutante: mw1269 rsyncd/ferm for scap proxy was enabled - mw1268 rsyncd/ferm for scan proxy was removed - deploy1001 scap-proxies dsh group was adjusted
* 19:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.38.0-wmf.2
* 22:21 mutante: replacing scap proxy for rack A7 eqiad because mw1268 needs to move physically ([[phab:T266164|T266164]])
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 22:21 bstorm: updated packages for thirdparty/kubeadm-k8s-1-17 to prepare for install [[phab:T263284|T263284]]
* 19:03 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to all wikis
* 22:10 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:50 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=test2wiki
* 22:08 razzi@cumin1001: START - Cookbook sre.hosts.downtime
* 18:46 sukhe: running authdns-update for [[phab:T292537|T292537]]
* 22:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:29 urbanecm: Morning B&C window done
* 22:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4a946c046ae17a520f8d3463a16b1435ceb4856c}}: Deploy Growth mentor dashboard to pilot wikis ([[phab:T278920|T278920]]) (duration: 01m 04s)
* 22:06 mutante: depooled mw1267 ([[phab:T266164|T266164]])
* 18:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 03s)
* 22:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87e300137c14451949fac12c3ec89319305a423e}}: Deploy Growth features to test2wiki (duration: 01m 04s)
* 22:04 mutante: scandium - puppet disabled again (but only until tomorrow), downtimed in Icinga, for ongoing parsoid tests from testreduce1001
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|31770f2b3660e7d7490c0a9ab66285c1f069732d}}: shwiki: Deploy Growth features to newcomers ([[phab:T278240|T278240]]) (duration: 01m 04s)
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|33526dfed148068585289f5ac501feda72068fd9}}: Stream config changes for android_daily_stats schema ([[phab:T286000|T286000]]) (duration: 01m 06s)
* 22:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:10 ejegg: updated payments-wiki from {{Gerrit|6d3560d083}} to {{Gerrit|030b11da1a}}
* 20:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:07 arnoldokoth: gitlab2001 re-image complete ([[phab:T283076|T283076]])
* 20:50 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 17:30 mutante: rebooting gitlab2001.wikimedia.org
* 20:23 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:56 arnoldokoth: down timing gitlab2001 for re-imaging ([[phab:T283076|T283076]])
* 20:17 herron@cumin1001: START - Cookbook sre.dns.netbox
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 20:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2001.wikimedia.org with reason: reimage
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:32 hnowlan: roll restarting maps cassandra instances for java updates
* 20:06 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:19 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:19 ayounsi@cumin2002: START - Cookbook sre.network.cf
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 20:06 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 19:31 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 16:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.cf (exit_code=99)
* 19:31 cdanis@cumin1001: START - Cookbook sre.network.cf
* 16:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 19:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001 (duration: 00m 08s)
* 19:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:07 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit1001
* 19:22 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session on mwmaint1002 (wiki=ukwiki; [[phab:T246539|T246539]])
* 14:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 Amir1: rolling restart of ores uwsgi
* 14:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001 (duration: 00m 10s)
* 19:00 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:49 hashar@deploy1002: Started deploy [gerrit/gerrit@13cef9f]: Gerrit to 3.3.6 on gerrit2001
* 18:58 robh@cumin1001: START - Cookbook sre.hosts.downtime
* 14:48 hashar: Upgrading Gerrit replica to 3.3.6 # [[phab:T290236|T290236]]
* 18:16 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 14:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WikiLove on hewikiquote ([[phab:T266744|T266744]]) (duration: 00m 57s)
* 14:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:09 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 13:56 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 18:07 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 18:06 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 13:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:06 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 13:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 18:06 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 13:29 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 18:05 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikiquote wikilove # [[phab:T266744|T266744]]
* 13:29 hashar: restarting CI Jenkins for git plugin update
* 18:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b7eaaab81e1665c478f5dc1fdb495e36c53e7863}}: [cswiki] Set wgGEHomepageManualAssignmentMentorsList to Wikipedie:Potřebuji pomoc/Mentoři/Manuální ([[phab:T245639|T245639]]) (duration: 00m 57s)
* 13:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 13:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:48 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 13:14 hashar: Upgraded CI Jenkins on contint2001
* 17:29 hashar: Restarted CI Jenkins a bit ago
* 13:14 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:15 hashar: CI: killed all java  agents (java upgrade)
* 13:13 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:12 hashar: Stopping CI Jenkins
* 13:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:59 XioNoX: Delete cr1-eqiad:ae2.1120 and related static routes - [[phab:T265288|T265288]]
* 13:09 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:46 _joe_: restarted kartotherian on all servers in eqiad at the same time
* 13:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:38 XioNoX: Move cr2-eqiad:ae2.1120 to cloudsw1-d5:irb.1120 - [[phab:T265288|T265288]]
* 13:06 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 16:34 XioNoX: force VRRP master on cr1-eqiad - [[phab:T265288|T265288]]
* 13:05 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 13:05 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 12:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:34 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Revert: switch restbase to use envoy, https (duration: 00m 57s)
* 12:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:22 moritzm: installing bacula updates from Buster point release
* 12:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:22 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/intersection/: {{Gerrit|483c3bceb926ac6a2cfc40112fb9b4f0671fef72}}: Attempt to add a query cache to DPL ([[phab:T263220|T263220]]) (duration: 00m 58s)
* 12:40 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 15:16 papaul: poweroff mc2029 for relocation
* 12:16 moritzm: installing testvm2005
* 15:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|19c5aff02c20812c56b8abdcc0ed530393010193}}: Set wgDLPQueryCacheTime to 120 at all wikis ([[phab:T263220|T263220]]) (duration: 00m 59s)
* 11:59 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 15:09 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Switch restbase to use envoy, https (duration: 00m 57s)
* 11:52 Lucas_WMDE: EU backport+config window (aka UTC morning) done
* 15:06 vgutierrez: rolling restart of ATS to upgrade to trafficserver 8.0.8-1wm3 - [[phab:T265911|T265911]]
* 11:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:59 papaul: poweroff sessionstore2002 for relocation
* 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725858{{!}}Enable Content and Section Translation to Kurdish WP (T290238)]] (duration: 01m 04s)
* 14:36 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:35 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/WikidataPageBanner/includes/WikidataPageBannerFunctions.php: Backport: [[gerrit:727188{{!}}Change PropertyId to NumericPropertyId (T289125, T292667)]] (duration: 01m 05s)
* 14:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:29 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 11:10 jbond: update puppet stdlib gerrit:726872
* 14:26 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 09:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2004.codfw.wmnet
* 14:24 elukey: restart zookeeper on an-conf1001 for openjdk upgrades
* 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host ms-be2045.codfw.wmnet
* 14:20 jmm@cumin1001: START - Cookbook sre.hosts.decommission
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2005.codfw.wmnet
* 14:08 godog: bump FS for prometheus codfw global instance
* 09:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2004.codfw.wmnet
* 13:54 elukey: roll out profile::java on all zookeeper instances
* 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2005.codfw.wmnet
* 13:53 moritzm: installing Java 11 security updates
* 08:49 mvernon@cumin2002: START - Cookbook sre.experimental.reimage for host ms-be2045.codfw.wmnet
* 13:52 bblack: authdns1001 - restart gdnsd - [[phab:T266746|T266746]]
* 08:36 moritzm: imported jenkins 2.303.2 to thirdparty/ci component for buster-wikimedia
* 13:46 bblack: authdns2001 - restart gdnsd - [[phab:T266746|T266746]]
* 07:57 Emperor: re-enabling puppet on ms-be2045 after hw work [[phab:T290881|T290881]]
* 13:38 bblack: staggered restart of gdnsd on dns[12345]001 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 07:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 13:29 bblack: staggered restart of gdnsd on dns[12345]002 (1/2 recursors in each DC) - [[phab:T266746|T266746]]
* 07:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 13:25 Urbanecm: Correction: Obviously 1002 ([[phab:T246539|T246539]])
* 07:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 13:23 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=idwiki; [[phab:T246539|T246539]])
* 07:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 13:21 moritzm: installing bluez security updates on stretch
* 07:34 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 12:56 marostegui: Make orchestrator discover pc2 [[phab:T266485|T266485]]
* 07:33 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 12:55 marostegui: Deploy orchestrator grants on pc2 [[phab:T266485|T266485]]
* 07:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 12:44 marostegui: Deploy grants for cluster alias on pc1 [[phab:T266485|T266485]]
* 07:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 12:35 moritzm: upgrade idp-test* hosts to latest Java securiy updates
* 06:21 ryankemper: [Elastic] Restart of `relforge` complete
* 12:35 moritzm: restart idp-test
* 06:05 ryankemper: [Elastic] Cluster in green status, proceeding to next and final node => `ryankemper@relforge1003:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 12:34 ariel@deploy1001: Finished deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables (duration: 00m 05s)
* 05:53 ryankemper: [Elastic] `ryankemper@relforge1004:~$ sudo systemctl restart elasticsearch_6@relforge-eqiad-small-alpha.service && sudo systemctl restart elasticsearch_6@relforge-eqiad.service`
* 12:33 ariel@deploy1001: Started deploy [dumps/dumps@4ed2cb9]: revinfo for page content jobs, tableinfo for list of known tables
* 05:48 ryankemper: [Elastic] Performing rolling restarts of `relforge`. `relforge1003` is the master so I'll restart `relforge1004` first to minimize disruption
* 12:01 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 03:00 ejegg: updated payments-wiki from {{Gerrit|23d0ffac66}} to {{Gerrit|6d3560d083}}
* 11:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 02:28 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: enable Parsoid API everywhere (duration: 01m 04s)
* 11:14 Urbanecm: EU B&C window done
* 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|28152b7387082b79d71cfbf28be740ffe629ee50}}: Add another SDC property to search for matching media statements ([[phab:T264925|T264925]]) (duration: 00m 58s)
* 00:11 mutante: [grafana2001:~] $ sudo systemctl start rsync-var-lib-grafana  because of "PROBLEM - Check systemd state on grafana2001 is CRITICAL: CRITICAL - degraded" because of some race condition where a file vanished during sync
* 11:11 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:07 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:07 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:06 klausman@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:06 klausman@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 10:15 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 10:12 elukey: restart tilerator on maps100[1,4] - redis errors in the logs
* 10:11 elukey: restart tilerator on maps1002 - redis errors in the logs
* 10:03 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:03 elukey: drop 10.64.21.6/24 and 2620:0:861:105:10:64:21:6/64 from netbox (an-tool-ui1001 related records)
* 09:59 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: Fix cxserver's configuration to use envoy (duration: 00m 59s)
* 09:52 elukey: add gdnsd.service to all gdnsd hosts (with LimitNOFILE=infinity as override) - no daemon restart done - [[phab:T266746|T266746]]
* 09:41 marostegui: Deploy schema change on s8 wikidata codfw master (db2079) [[phab:T264109|T264109]]
* 09:33 elukey: clean up 10.64.21.7/24 and 2620:0:861:105:10:64:21:7/64 from netbox (an-test-ui1001 already have ips previously allocated by makevm)
* 09:32 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 09:23 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:54 vgutierrez: turn off ECDHE-ECDSA-AES128-SHA support on the main caching cluster - [[phab:T258405|T258405]]
* 08:54 moritzm: fixing up stray jenkins auto restart timers on secondary releases server
* 08:53 vgutierrez: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 08:48 moritzm: fixing up stray mcelog auto restart timers on kubestage*
* 08:38 moritzm: fixing up stray cas auto restart timers on secondary IDP servers
* 08:19 moritzm: fixing up stray pmacctd auto restart timers on netflow*
* 08:19 moritzm: fixing up stray pcacctd auto restart timers on netflow*
* 08:02 marostegui: Disconnect replication codfw -> eqiad on s1 [[phab:T266663|T266663]]
* 07:56 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns1001
* 07:54 marostegui: Disconnect replication codfw -> eqiad on s4 [[phab:T266663|T266663]]
* 07:50 vgutierrez: restart haproxy on authdns2001
* 07:49 marostegui: Disconnect replication codfw -> eqiad on s8 [[phab:T266663|T266663]]
* 07:48 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 07:46 marostegui: Disconnect replication codfw -> eqiad on s3 [[phab:T266663|T266663]]
* 07:43 vgutierrez: restart anycast-healthchecker on authdns2001
* 07:34 vgutierrez: set LimitNOFILE=500000 for gdnsd on authdns2001
* 07:27 elukey: "sudo truncate -s 10g /var/log/daemon.log" on authdns2001
* 06:52 marostegui: Disconnect replication codfw -> eqiad on s2 [[phab:T266663|T266663]]
* 06:38 marostegui: Disconnect replication codfw -> eqiad on s7 [[phab:T266663|T266663]]
* 06:36 marostegui: Disconnect replication codfw -> eqiad on s6 [[phab:T266663|T266663]]
* 06:25 elukey: execute 'truncate -s 10g /var/log/syslog.1 on authdns2001 - root partition full
* 06:23 marostegui: Disconnect replication codfw -> eqiad on s5 [[phab:T266663|T266663]]
* 06:10 marostegui: Disconnect replication codfw -> eqiad on es4 and es5 [[phab:T266663|T266663]]
* 06:07 marostegui: Disconnect replication codfw -> eqiad on x1 [[phab:T266663|T266663]]
* 05:58 marostegui: Disconnect replication codfw -> eqiad on pc1, pc2 and pc3 [[phab:T266663|T266663]]
* 04:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 01:41 mutante: scandium reimaged a second time after making puppet changes to ensure nodejs/npm is NOT installed anymore ([[phab:T257906|T257906]])
* 01:17 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of eqiad cirrus cluster, 3 nodes at a time, on `ryankemper@cumin1001` tmux session `elasticsearch_restart_eqiad`
* 01:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-restart
* 00:51 ryankemper: Finished restart of wdqs categories across production hosts; wdqs deploy is complete and the service is healthy
* 00:14 Amir1: rolling restart of ores
* 00:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:04 ryankemper: Beginning restart of wdqs categories across production hosts, one at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 60 && systemctl restart wdqs-categories && sleep 30 && pool'`
* 00:03 ryankemper: Restarted wdqs categories across test hosts: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:03 ryankemper: Restarted wdqs updater across all hosts: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:02 ryankemper: Following wdqs deploy, https://query.wikidata.org successfully responds to an example query
* 00:01 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8c97b17]: 0.3.53 (duration: 09m 29s)


== 2020-10-28 ==
== 2021-10-06 ==
* 23:54 ryankemper: Canary `wdqs1003` tests pass, proceeding with wdqs deploy to rest of fleet
* 23:57 mutante: releases2002 - rm /srv/org/wikimedia/reprepro/conf/distributions - contains only jessie-mediawiki - see 725670 and EOL of MediaWiki 1.31
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]: 0.3.53
* 23:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:52 ryankemper@deploy1001: deploy aborted:  0.3.53 (duration: 00m 00s)
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:52 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8c97b17]:  0.3.53
* 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:54 mutante: scandium - scap pull after reinstalling OS
* 23:21 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 22:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:20 jforrester@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ckb.svg: Config: [[gerrit:726955{{!}}Adding and use wordmark in ckbwiki (T288368)]] (duration: 01m 04s)
* 22:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:41 ryankemper: Disabled elasticsearch "saneitizer" systemd timer in eqiad due to checker jobs falling behind: `sudo systemctl disable mediawiki_job_cirrus_sanitize_jobs.timer` on `mwmaint1002`
* 23:16 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:726603{{!}}Enable NewUserMessage for ptwikivoyage (T290820)]] (duration: 01m 05s)
* 21:22 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:30 mutante: re-enabling puppet on mw*, an-worker* after deploying gerrit:726954. no issue this time
* 21:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:23 mutante: temp. disabling puppet on an-worker*, mw*
* 21:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 20:50 mutante: global puppet failure - revert is merged, puppet run will recover on next run everywhere. partially forcing with cumin, partially letting it recover naturally
* 20:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:43 mutante: [cumin1001:~] $ sudo cumin -b 5 -p 95 'mw2*' 'run-puppet-agent -q --failed-only'
* 20:22 ladsgroup@deploy1001: Synchronized static/images/project-logos: Changing logo of Wikidata for the brithday (duration: 00m 58s)
* 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:56 jgleeson: updated Smashpig from {{Gerrit|2246685626}} to {{Gerrit|09f29c1da5}}
* 19:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:53 herron@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 19:05 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 01m 03s)
* 19:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 19:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:01 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): still unblocked after triage meeting, rolling to group1
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:36 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:44 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Revert disabling static mapframes on eswiki (duration: 01m 14s)
* 19:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:31 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eswiki: Disable static mapframes ([[phab:T291736|T291736]]) (duration: 01m 17s)
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:56 tgr_: Morning deploys done
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:55 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636983{{!}}Temporary enable 'editpage' warn logging (T251023)]] (duration: 00m 57s)
* 18:22 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: viwikibooks: Set $wgRestrictDisplayTitle to false ([[phab:T289837|T289837]]) (duration: 01m 21s)
* 18:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:47 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:53 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:636791{{!}}Revert "cirrus: Hardcode more_like to codfw cirrus cluster"]] (duration: 00m 56s)
* 16:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 18:45 tgr@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: Config: [[gerrit:636956{{!}}Revert "Revert "Increase cirrus morelike pool counter by 20%"" ()]] (duration: 00m 57s)
* 16:47 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:43 volans@cumin1001: START - Cookbook sre.dns.netbox
* 16:43 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): unblocked, rolling to group0
* 18:40 tgr@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/includes/HomepageModules/SuggestedEdits.php: Backport: [[gerrit:636787{{!}}Suggested edits: Include page ID with task preview data (T266600)]] (duration: 00m 59s)
* 16:41 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 18:19 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:619880{{!}}Removing obsolete license definition]] (duration: 01m 00s)
* 16:35 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726596{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 04s)
* 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:35 jynus: stopping db1127 for hw maintenance [[phab:T292366|T292366]]
* 18:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:31 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 18:06 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:31 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: hw maintenance
* 18:02 elukey@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 16:28 brennen@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Scribunto/includes/engines/LuaCommon/LanguageLibrary.php: Backport: [[gerrit:726597{{!}}Replace deprecated ParserOptions::getUser with ::getUserIdentity (T292589)]] (duration: 01m 10s)
* 17:46 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:30 hnowlan: reimporting OSM data for eqiad
* 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:24 hnowlan: removing OSM database on maps1004
* 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:01 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1002.eqiad.wmnet
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:45 brennen: 1.38.0-wmf.3 train ([[phab:T281167|T281167]]): proceeding to deploy backports for [[phab:T292589|T292589]]
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:37 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1002.eqiad.wmnet
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:35 volans: installer spicerack 1.0.4 on cumin2002
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:50 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:48 volans: uploaded spicerack_1.0.4 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1004.eqiad.wmnet
* 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2004.codfw.wmnet
* 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1004.eqiad.wmnet
* 12:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:18 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kartotherian,service=kartotherian,name=maps1004.eqiad.wmnet
* 12:18 effie: pool mw1455 mw1422
* 16:16 hnowlan: Disabling tilerator in eqiad
* 12:17 urbanecm: wikiadmin@10.64.0.164(viwiki)> delete from growthexperiments_mentee_data; # cleanup after disabling mentor dashboard backend
* 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2004.codfw.wmnet
* 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1aa67d4846f39f59127a835cb7a8ed2974506025}}: viwiki: Disable mentor dashboard backend ([[phab:T278920|T278920]]) (duration: 01m 06s)
* 16:06 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 16:03 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 11:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2003.codfw.wmnet
* 15:51 Amir1: restarting uwsgi on ores in eqiad
* 11:55 XioNoX: esams - Advertise 185.15.59.0/24 instead of 185.15.58.0/23 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 15:49 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:50 jelto: disable puppet on gitlab1001 to test puppetized code on GitLab replica - [[phab:T283076|T283076]]
* 15:24 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 15:23 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 15:10 godog: roll restart logstash5 in codfw
* 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:50 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:05 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 10:04 urbanecm@deploy1002: Synchronized wmf-config/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 2/2) (duration: 01m 05s)
* 13:54 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 10:01 urbanecm@deploy1002: Synchronized dblists/: {{Gerrit|01633739462f3bf09ae4e50b955454921ea4fbf9}}: Delete gettingstarted-with-category-suggestions dblist ([[phab:T235752|T235752]]; 1/2) (duration: 01m 04s)
* 12:39 moritzm: installing libdatetime-timezone-perl  updates
* 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 11:46 XioNoX: configure urpf strict log-only on cr3-ulsfo:et-0/0/1.501 - [[phab:T266561|T266561]]
* 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 10:39 ema: due to [[phab:T266651|T266651]], cancel the entry above: A:cp upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 09:19 jbond: update ipaddress6 fact - https://gerrit.wikimedia.org/r/c/operations/puppet/+/726625
* 10:38 elukey: clean up 10.64.5.7 and 2620:0:861:104:10:64:5:7 from Netbox (records mistakely allocated via the makevm cookbook) - [[phab:T266648|T266648]]
* 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 elukey@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97)
* 09:13 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/Wikibase/repo/includes/ChangeModification/DispatchChangesJob.php: Backport: [[gerrit:725923{{!}}Don't fail job if subscribed wiki is unknown (T292446 T292440)]] (duration: 01m 15s)
* 10:25 ema: A:cp (except cp3052, running varnish 5) upgrade libvmod-netmapper to 1.9-1 [[phab:T266567|T266567]] [[phab:T264398|T264398]]
* 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:20 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:29 volans@cumin2002: END (PASS) - Cookbook sre.experimental.reimage (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:21 XioNoX: add ROAs for 185.15.58.0/24 and 185.15.59.0/24 - [[phab:T288505|T288505]] - [[phab:T283050|T283050]]
* 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 volans@cumin2002: START - Cookbook sre.experimental.reimage for host sretest1001.eqiad.wmnet
* 09:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews --fix # [[phab:T291344|T291344]]
* 09:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 07:56 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php plwikinews # [[phab:T291344|T291344]]
* 09:26 jayme: imported kubeyaml 0.0.3~20201027+git5f5556c-1 to buster-wikimedia
* 07:55 urbanecm: mwdebug1001: scap pull ([[phab:T291344|T291344]] fix done)
* 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:51 urbanecm: Staging at mwdebug1001 for [[phab:T291344|T291344]]
* 09:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 05:53 kart_: Updated cxserver to use nodejs12 ([[phab:T290754|T290754]])
* 08:37 jynus: updated dump grants on db2093
* 05:47 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:53 volans: upgraded python3-wmflib to 0.0.3 on the cumin hosts - [[phab:T257905|T257905]]
* 05:39 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:40 godog: update thanos-fe1002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 05:36 Amir1: start of mwscript extensions/Wikibase/repo/maintenance/pruneChanges.php --wiki wikidatawiki --number-of-days=2
* 07:22 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 05:31 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:43 ryankemper: [[phab:T266492|T266492]] Finished rolling restart of codfw cirrus cluster
* 04:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:43 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-restart (exit_code=0)
* 04:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:58 ryankemper: [[phab:T266492|T266492]] Beginning rolling restart of codfw cirrus cluster, 3 nodes at a time, on `ryankemper@cumin2001` tmux session `elasticsearch_restart_codfw`
* 04:29 ryankemper: [WDQS] `wdqs1012` is back up after restarting blazegraph (blazegraph was locked up)
* 02:57 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-restart
* 04:27 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (attempting to bring downed `wdqs1012` back into health)
* 02:12 eileen: tools revision changed from {{Gerrit|a2a91d6c6a}} to {{Gerrit|087a596d3a}}
* 04:25 ryankemper: [WDQS] Repooling eqiad hosts following the brief outage from earlier: `wdqs1004`, `wdqs1006`, `wdqs1007`
* 00:40 eileen: civicrm revision changed from {{Gerrit|4fdfb8408b}} to {{Gerrit|e1d65b0f3a}}, config revision is {{Gerrit|f16003ab62}}
* 03:19 eileen: civicrm revision changed from {{Gerrit|b6f5f71c18}} to {{Gerrit|82efd2e195}}, config revision is {{Gerrit|f4c57d4733}}
* 03:11 tstarling@deploy1002: Synchronized php-1.38.0-wmf.3/includes/CommentFormatter/RowCommentIterator.php: fix UBN [[phab:T292590|T292590]] (duration: 01m 04s)
* 01:39 legoktm: legoktm@mwmaint1002:~$ echo "https://en.wikiversity.org/static/images/mobile/copyright/wikiversity.svg" {{!}}mwscript purgeList.php
* 01:17 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 03s)
* 01:12 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/extensions/GlobalUserPage/includes/GlobalUserPage.php: Bump GlobalUserPage::PARSED_CACHE_VERSION for media DOM changes (duration: 01m 17s)
* 00:59 arlolra@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable legacy media dom on metawiki (duration: 01m 05s)
* 00:37 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:35 arlolra@deploy1002: Synchronized php-1.38.0-wmf.2/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 03s)
* 00:32 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/resourceloader/ResourceLoaderSkinModule.php: Add a separate config for content.media.less 2/2 (duration: 01m 03s)
* 00:29 arlolra@deploy1002: Synchronized php-1.38.0-wmf.3/includes/DefaultSettings.php: Add a separate config for content.media.less 1/2 (duration: 01m 04s)
* 00:16 mutante: puppetmasters: rm /etc/logrotate.d/geoipupdate && systemctl start logrotate && puppet agent -tv
* 00:14 mutante: puppetmaster2002 - rm /etc/logrotate.d/geoipupdate (not managed by puppet anymore but not removed, caused duplicate logrotate config, made logrotate service fail), start logrotate
* 00:08 cstone: civicrm revision changed from {{Gerrit|34d3c3aae8}} to {{Gerrit|b6f5f71c18}}
* 00:01 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725132{{!}}Add WN as an alias to project namespace in Polish Wikinews (T291344)]] (duration: 01m 04s)


== 2020-10-27 ==
== 2021-10-05 ==
* 22:20 mutante: systemctl reset-failed on various servers to see which are coming back later from failed auto_restart and which don't
* 23:54 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikiversity.svg: Config: [[gerrit:725413{{!}}Wikiversity Logo Update for 2017 Logo Version (T292109)]] (duration: 01m 03s)
* 21:40 mutante: mwmaint2001 - systemctl reset-failed - mediawiki_job_parser_cache_purging.service
* 23:47 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 04s)
* 20:56 mutante: ms-be1057 is network down but running, NO-CARRIER on NIC, cable disconnected?
* 23:44 tgr@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-az.svg: Config: [[gerrit:704376{{!}}Adding and use wordmark in azwiki (T284877)]] (duration: 01m 23s)
* 20:43 mutante: releases2002 - systemctl reset-failed .. after removing wmf_auto_restart_rsync
* 23:16 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725386{{!}}Add image_suggestion_interaction event stream]] (duration: 01m 12s)
* 20:13 mutante: gerrit1001/gerrit2001: manually deleting list_mediawiki_extensions cron job ([[phab:T266024|T266024]])
* 23:02 legoktm: deleting old stretch docker images from the registry for [[phab:T292485|T292485]]
* 19:40 eileen: civicrm revision changed from {{Gerrit|bb7c08bf6d}} to {{Gerrit|4fdfb8408b}}, config revision is {{Gerrit|f16003ab62}}
* 22:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 18:35 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 22:20 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]) rolling back to testwikis for the day; will revisit in US-morning
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 20:47 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 17:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 20:44 brennen@deploy1002: Synchronized php-1.38.0-wmf.3/includes/page: Backport: [[gerrit:726594{{!}}Pre-format comments for non-local files too]] ([[phab:T292570|T292570]]) (duration: 01m 04s)
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 20:18 mutante: puppetmaster1003 et al - converting maxmind geoip database fetching from cron to timers
* 17:46 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 20:06 mutante: cumin 'puppetmaster*' "disable-puppet '[[phab:T288844|T288844]] - [[phab:T273673|T273673]] - gerrit:721595 - $<nowiki>{</nowiki>USER<nowiki>}</nowiki>'"
* 17:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:30 mutante: restoring /home/amire80 from and to mwmaint2002 via Bacula bconsole ([[phab:T292573|T292573]])
* 17:22 mutante: gerrit1001/2001 - sudo rm /var/www/mediawiki-extensions.txt
* 19:09 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.38.0-wmf.2
* 17:18 ejegg: updated payments-wiki from {{Gerrit|4c1503ad91}} to {{Gerrit|adc3369cb3}}
* 19:03 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 16:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:26 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.23 (duration: 01m 57s)
* 16:34 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 18:23 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.21 (duration: 04m 20s)
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:21 brennen: 1.38.0-wmf.3 ([[phab:T281167|T281167]]): pruning old branches, starting with 1.37.0-wmf.21, proceeeding to 1.37.0-wmf.23 if time allows
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 18:11 ppchelko@deploy1002: Synchronized wmf-config: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] Php72ToUpper.php removal (duration: 01m 06s)
* 16:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 18:04 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove mb_strtoupper overrides for HHVM [[phab:T219279|T219279]] CS.php (duration: 01m 06s)
* 16:05 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 17:55 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]] (duration: 45m 59s)
* 16:05 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:12 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 15:59 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:09 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.3  refs [[phab:T281167|T281167]]
* 15:42 mepps: updated payments-wiki-staging from {{Gerrit|5fdd29bc16}} to {{Gerrit|4c1503ad91}}
* 17:03 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 15:25 ema: cp4032: downgrade varnish to 6.0.4 [[phab:T264398|T264398]]
* 17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 15:13 ema: cp4032: varnish-frontend-restart with libvmod-netmapper 1.9-1 [[phab:T266567|T266567]]
* 17:02 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - btullis@cumin1001
* 14:55 ema: upload libvmod-netmapper 1.9-1 to buster-wikimedia component/varnish6 [[phab:T266567|T266567]]
* 16:56 brennen: successfully applied security patches for 1.38.0-wmf.3 train ([[phab:T281167|T281167]])
* 14:49 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 16:47 brennen: coordinated with deployment backup and starting train prep for 1.38.0-wmf.3 ([[phab:T281167|T281167]]), branched at {{Gerrit|65279490f82c785181b8b6961e40901a4aaafca4}}
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:57 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 14:40 _joe_: restarting envoyproxy on the jobrunners in codfw
* 15:57 jbond@cumin2002: START - Cookbook sre.puppet.renew-cert for puppetboard2002.codfw.wmnet: Renew puppet certificate - jbond@cumin2002
* 14:36 akosiaris: rolling restart of all pods in codfw changeprop-jobqueue
* 15:38 jbond: reimage puppetboard2002
* 14:27 _joe_: restart php-fpm on jobrunners in codfw
* 15:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:17 cdanis: ran puppet on alert1001
* 15:15 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for puppetboard1002.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
* 14:16 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 15:10 moritzm: imported routinator 0.10.1-1bullseye to thirdparty/routinator for bullseye-wikimedia [[phab:T292503|T292503]]
* 14:15 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:58 jbond: reimage puppetboard1002
* 14:15 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 14:40 effie: depool  mw1455 and mw1422
* 14:11 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 14:30 Pchelolo: run foreachwiki uppercaseTitlesForUnicodeTransition.php --charmap current_to_php7_overrides.php [[phab:T219279|T219279]]
* 14:11 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 13:51 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor - Drop REL1_31, start REL1_37 (duration: 00m 57s)
* 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 Pchelolo: run renameInvalidUsernames.php --wiki loginwiki --list /tmp/rename_users_for_uppercase_all.txt [[phab:T219279|T219279]]
* 14:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:39 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 13:39 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - elukey@cumin1001
* 14:09 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 13:23 ppchelko@deploy1002: Synchronized php-1.38.0-wmf.2/maintenance/uppercaseTitlesForUnicodeTransition.php: Backport uppercaseTitlesForUnicodeTransition.php maintenance script improvements [[phab:T219279|T219279]] (duration: 00m 58s)
* 14:09 rzl@cumin1001: MediaWiki read-only period ends at: 2020-10-27 14:09:02.873019
* 12:53 ema: upload varnish 6.0.8-1wm1 to apt.wikimedia.org [[phab:T292290|T292290]]
* 14:09 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 12:43 elukey: import AMD ROCm 4.2 to buster-wikimedia's thirdparty/amd-rocm42 - [[phab:T287267|T287267]]
* 14:06 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 ema: deployment-cache-text06: upgrade varnish to 6.0.8-1wm1 [[phab:T292290|T292290]]
* 14:06 root@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 hnowlan: reverted restbase2023 to use CN=hostname certificate due to loading errors
* 14:05 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 11:57 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:04 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 11:57 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:04 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=99)
* 11:37 hnowlan@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 11:28 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2023.codfw.wmnet: Switching over to using FQDN certificate - hnowlan@cumin1001
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 11:17 hnowlan_: disabling puppet on cassandra nodes for rollout of 724061 - defaulting to cn=fqdn certificates
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 11:15 effie: upgrade scap to 4.0.2 - [[phab:T291095|T291095]]
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 11:12 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|04524992865b0ae5750eb6fb0a374aa74a65b383}}: Enable local uploads for tcywiki ([[phab:T166763|T166763]]) (duration: 00m 59s)
* 14:03 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 10:11 vgutierrez: update acme-chief to version 0.32 on acmechief hosts - [[phab:T290249|T290249]]
* 14:03 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 10:09 vgutierrez: update acme-chief to version 0.32 on acmechief-test hosts - [[phab:T290249|T290249]]
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 10:06 vgutierrez: upload acme-chief 0.32 to apt.wm.o (buster) - [[phab:T290249|T290249]]
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 09:46 hnowlan_: generated cassandra certificate using FQDN for restbase2023
* 14:02 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 09:09 topranks: updating routinator on rpki2001 ([[phab:T291543|T291543]])
* 14:02 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 08:59 dcausse: depool and restart blazegraph on wdqs1007
* 14:01 rzl@cumin1001: MediaWiki read-only period starts at: 2020-10-27 14:01:54.999830
* 08:51 moritzm: installing openssl security updates for stretch (buster/bullseye already fixed)
* 14:01 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 07:58 moritzm: installing apache security updates
* 13:56 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 07:57 elukey: upgrade GPU drivers (AMD ROCm 4.3.1) on an-worker1[096-101]
* 13:56 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 07:27 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:26 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:55 root@cumin1001: START - Cookbook sre.hosts.downtime
* 07:26 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.wmnet
* 13:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 06:38 elukey: reboot an-worker1096 after installing new GPU drivers
* 13:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 04:20 eileen: civicrm revision changed from {{Gerrit|d74e9aa0a1}} to {{Gerrit|34d3c3aae8}}, config revision is {{Gerrit|cae09f7691}}
* 13:50 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:49 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:46 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 13:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 13:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 13:15 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 13:04 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 13:01 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 12:55 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 12:51 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:21 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:14 ema: A:cp remove libvarnishapi1, replaced by libvarnishapi2 a while ago [[phab:T261487|T261487]]
* 11:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 11:06 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:54 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:40 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 10:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:21 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqiad - [[phab:T265589|T265589]]
* 10:20 XioNoX: update policies from-zone production to-zone junos-host on mr1-eqsin - [[phab:T265589|T265589]]
* 10:19 XioNoX: update policies from-zone production to-zone junos-host on mr1-ulsfo - [[phab:T265589|T265589]]
* 10:15 XioNoX: update policies from-zone production to-zone junos-host on mr1-esams - [[phab:T265589|T265589]]
* 10:06 XioNoX: update policies from-zone production to-zone junos-host on mr1-codfw - [[phab:T265589|T265589]]
* 08:58 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:55 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:39 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=97)
* 08:32 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0)
* 08:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 08:15 godog: update thanos-fe2002 to thanos 0.16.0 - [[phab:T261281|T261281]]
* 07:35 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 06:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 06:50 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-4
* 06:42 ryankemper: [[phab:T263970|T263970]] Set number of replicas to 2 (from previous value of 1) for all codfw indices matching `apifeatureusage*`, new shards have been assigned without issue


== 2020-10-26 ==
== 2021-10-04 ==
* 23:12 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Fix JS error when no topics set ([[phab:T266501|T266501]]) (duration: 01m 00s)
* 23:30 foks: resetting some emails used for abuse by a globally-banned user
* 22:30 mutante: netflow5001 - systemctl reset-failed
* 23:19 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 21:44 rzl: live test of sre.switchdc.mediawiki complete, the foregoing logging noise had no actual production impact
* 23:18 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:726084{{!}} Bumping portals to master (T128546)]] (duration: 00m 59s)
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|75645c9cc59b37dbf59942eabbc014b7dc147626}}: Add explicit config for licensing/copyright message overrides ([[phab:T284097|T284097]]) (duration: 00m 59s)
* 21:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 23:05 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 21:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 22:54 mutante: puppetmaster2001 - rm /etc/logrotate.d/geoipupdate_ipinfo  and geoipupdate_ipinfo ; running puppet, starting logrotate service
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 18:13 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:41 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 16:51 bblack: rolling restart of haproxy for DoTLS on dns300[12],authdns1001,authdns2001 to recycle connections
* 21:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 15:24 vgutierrez: pool cp5006
* 21:40 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 15:17 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 15:16 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:50 phuedx: phuedx@mwmaint1002:~$ mwscript extensions/SecurePoll/cli/purgeDecryptionKeys.php --wiki=votewiki --before="20210101000000"
* 21:37 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-10-26 21:37:17.809596
* 14:46 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:46 effie: uploading scap 4.0.2 - [[phab:T291095|T291095]]
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:45 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 21:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:39 brennen: gitlab: upgrade to 14.3.2 (note there was an additional patch release on 2021-10-01) complete ([[phab:T292256|T292256]])
* 21:37 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:25 Amir1: cleaning up wb_changes_subscription rows from closed wikis ([[phab:T292440|T292440]])
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:24 brennen: gitlab: downtime for upgrade to 14.3.1
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:19 elukey: import AMD ROCm 4.3.1 packages in buster-wikimedia's thirdparty/amd-rocm431 - [[phab:T287267|T287267]]
* 21:36 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:36 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:13 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:725905{{!}}Explicitly enable dispatching and pruning for wikidata (T48643)]] (duration: 00m 58s)
* 21:35 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-10-26 21:35:20.837214
* 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 14:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab2001.wikimedia.org with reason: upgrade gitlab2001 to new version https://phabricator.wikmiedia.org/T292256
* 21:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:01 ladsgroup@deploy1002: Synchronized wmf-config: Config: [[gerrit:725502{{!}}Enable dispatching via jobs everywhere (T48643)]] (duration: 01m 00s)
* 21:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 12:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 12:56 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725785{{!}}Enable dispatching for wikidatawiki and commonswiki (T292088)]] (duration: 01m 00s)
* 21:32 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 12:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 12:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 21:31 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2026.codfw.wmnet with reason: Ganeti tests
* 21:31 rzl: starting a live test of sre.switchdc.mediawiki, which will create some logging noise but no actual production impact
* 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 20:54 mutante: scandium rm /usr/local/bin/update_parsoid.sh (gerrit:636494)
* 12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Ganeti tests
* 20:15 ladsgroup@deploy1001: Finished deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]]) (duration: 06m 53s)
* 12:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:08 ladsgroup@deploy1001: Started deploy [ores/deploy@6912889]: Deploy new version of articlequality for wikidata ([[phab:T261326|T261326]])
* 11:55 urbanecm: EU B&C window done
* 19:31 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:55 urbanecm@deploy1002: Synchronized multiversion/MWWikiversions.php: {{Gerrit|508cf5cc6d213373f7c9ba1cdef142ebc8398022}}: Let DB expressions intersect DB lists ([[phab:T290609|T290609]]) (duration: 00m 58s)
* 19:29 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a855078cf52d88cc2cd27a0adc7c6a680c80dd39}}: dewiki, nlwiki: Bump Growth features to 80% ([[phab:T288420|T288420]], [[phab:T285254|T285254]]) (duration: 00m 58s)
* 19:26 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:46 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: {{Gerrit|5728376}}: Update [[phab:T250887|T250887]] mitigations (duration: 00m 58s)
* 18:59 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Remove variant setting override (no-op) ([[phab:T265556|T265556]]) (duration: 00m 57s)
* 11:44 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b0a96bed4562bcc975187b1d34626201d407404b}}: Undeploy GettingStarted V: Remove now-obsolete logging channels ([[phab:T235752|T235752]]) (duration: 00m 59s)
* 18:55 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure $wgBabelCategoryNames on ndswiki ([[phab:T264990|T264990]]) (duration: 00m 58s)
* 11:42 urbanecm@deploy1002: Synchronized wmf-config/extension-list: {{Gerrit|9709bcfc8dacbcd1704471df08c31cec0711bea6}}: Undeploy GettingStarted IV: Dont build i18n ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 18:51 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add www.legislation.gov.uk to $wgCopyUploadsDomains on commonswiki ([[phab:T265690|T265690]]) (duration: 00m 58s)
* 11:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d60f332785868797e7ecc9b5e410616d5604b392}}: Undeploy getting started III: Dont set wmgUseGettingStarted, now ignored ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 18:47 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: Make variant D the default, remove variant A ([[phab:T265372|T265372]], [[phab:T265556|T265556]]) (duration: 00m 58s)
* 11:37 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|9eaf960c4b7c304be57dfc8d248aca0c6501d04c}}: Undeploy GettingStarted II: Dont load regardless of config ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 18:46 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/vendor/wikimedia/parsoid/: Bump wikimedia/parsoid to v0.13.0-a13, enabling 6-element DSRs ([[phab:T266285|T266285]]) (duration: 00m 58s)
* 11:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7405ad1eb323a8da524819f17d6f1a66afaa57}}: Undeploy GettingStarted I: Disable on all wikis ([[phab:T235752|T235752]]) (duration: 00m 58s)
* 18:43 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/skins/Vector/: Fix logic in collapsibleTabs code ([[phab:T71729|T71729]]) (duration: 00m 58s)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724992{{!}}Remove deprecated SectionTranslationTargetLanguage config (T290302)]] (duration: 00m 58s)
* 18:21 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove wtp2001-wtp2020 from LinterSubmitterWhitelist ([[phab:T265558|T265558]]) (duration: 00m 59s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725042{{!}}Add wikisource-bot.toolforge.org to Commons copy upload list (T292213)]] (duration: 00m 59s)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Make variant D the default on all wikis ([[phab:T265556|T265556]]) (duration: 00m 58s)
* 11:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:720058{{!}}Add IA-Upload tool domains to Commons wgCopyUploadsDomains (T287241)]] (duration: 00m 59s)
* 17:58 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 11:12 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 17:48 mutante: an-worker109* - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 11:10 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 17:45 mutante: releases2002,netmon2001, various other hosts - systemctl reset-failed  to clear Icinga alerts related to wmf_auto_restart changes
* 11:07 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:39 krinkle@deploy1001: Synchronized php-1.36.0-wmf.13/resources/src/mediawiki.util/: [[phab:T265809|T265809]], {{Gerrit|I1011f63ae61f5a6}} (duration: 01m 00s)
* 11:06 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 16:41 XioNoX: bounce security log on pfw3-eqiad - [[phab:T263833|T263833]]
* 11:04 effie: depool  wtp1026 for tests
* 16:29 XioNoX: set security-log traceoptions on pfw3-eqiad - [[phab:T263833|T263833]]
* 11:04 effie: pool  wtp1025
* 16:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:59 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 09:13 akosiaris: hbal -L -G row_C -X on ganeti01.svc.eqiad.wmnet
* 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 54s)
* 16:00 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 08:58 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@071f7c3] (eqiad): Increase mirrored traffic to 100% for eqiad
* 15:51 rzl@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 07:37 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc] (duration: 06m 14s)
* 15:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=zotero,name=eqiad
* 07:31 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@38f3adc]
* 15:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wikifeeds,name=eqiad
* 07:30 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc] (duration: 00m 06s)
* 15:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
* 07:30 joal@deploy1002: Started deploy [analytics/refinery@38f3adc] (thin): Hotfix analytics deploy THIN [analytics/refinery@38f3adc]
* 15:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 07:29 joal@deploy1002: Finished deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc] (duration: 19m 18s)
* 15:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=termbox,name=eqiad
* 07:19 dcausse: restarting blazegraph on wdqs2001 & wdqs2004 (allocators burning too quickly)
* 15:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
* 07:18 elukey: depool + restart blazegraph + restart updater for wdqs1006
* 15:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1006.wmnet
* 15:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=schema,name=eqiad
* 07:18 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs1004.wmnet
* 15:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase-async,name=eqiad
* 07:10 joal@deploy1002: Started deploy [analytics/refinery@38f3adc]: Hotfix analytics deploy [analytics/refinery@38f3adc]
* 15:08 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=restbase,name=eqiad
* 07:02 godog: swift eqiad-prod: add weight to ms-be10[64-67] - [[phab:T290546|T290546]]
* 15:05 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
* 06:44 elukey: depool + restart blazegraph + restart updater on wdqs1004
* 15:02 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=push-notifications,name=eqiad
* 05:50 ladsgroup@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:59 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=proton,name=eqiad
* 05:49 ladsgroup@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:56 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=parsoid,name=eqiad
* 05:47 ladsgroup@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:53 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
* 14:50 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mobileapps,name=eqiad
* 14:47 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=mathoid,name=eqiad
* 14:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki (duration: 16m 43s)
* 14:44 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:41 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=graphoid,name=eqiad
* 14:38 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
* 14:35 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=eqiad
* 14:32 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-logging-external,name=eqiad
* 14:30 ppchelko@deploy1001: Started deploy [restbase/deploy@a1a1bd7]: Add api-portal and snmwiki
* 14:29 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
* 14:26 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
* 14:23 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=echostore,name=eqiad
* 14:20 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=cxserver,name=eqiad
* 14:17 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
* 14:14 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=api-gateway,name=eqiad
* 14:11 rzl@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=apertium,name=eqiad
* 14:06 rzl@cumin1001: conftool action : set/ttl=10; selector: dnsdisc=apertium{{!}}api-gateway{{!}}citoid{{!}}cxserver{{!}}echostore{{!}}eventgate-analytics{{!}}eventgate-analytics-external{{!}}eventgate-logging-external{{!}}eventgate-main{{!}}eventstreams{{!}}graphoid{{!}}kartotherian{{!}}mathoid{{!}}mobileapps{{!}}ores{{!}}parsoid{{!}}proton{{!}}push-notifications{{!}}recommendation-api{{!}}restbase{{!}}restbase-async{{!}}schema{{!}}search{{!}}sessionstore{{!}}termbox{{!}}wdqs{{!}}wdqs-internal{{!}}wikifeeds{{!}}zotero,name=eqiad
* 13:48 moritzm: imported cas 6.2.4-1 to apt.wikimedia.org [[phab:T265857|T265857]]
* 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bff6b37a55fe8f260fe00cbb942c53101167fb07}}: Add foto.digitalarkivet.no to wgCopyUploadsDomains whitelist of Wikimedia Commons ([[phab:T266390|T266390]]) (duration: 01m 14s)
* 11:27 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:27 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:26 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 11:11 vgutierrez: upgrade trafficserver to 8.0.8-1wm3 on cp4032 - [[phab:T265911|T265911]]
* 11:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99)
* 11:02 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers
* 10:51 vgutierrez: manually reloading nginx on cloudelastic[1005-1006]
* 10:29 vgutierrez: upload trafficserver 8.0.8-1wm3 to apt.wm.org (buster) - [[phab:T265911|T265911]]
* 10:18 godog: roll restart pybal to apply latest configuration
* 09:51 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-3
* 09:31 moritzm: restarting PHP FPM on mw canaries to pick up freetype update
* 09:04 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 08:58 moritzm: installing freetype security updates for stretch
* 08:57 XioNoX: remove down sessions to AS38758
* 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:43 XioNoX: remove down sessions to AS8560
* 08:41 XioNoX: remove down sessions to AS31334
* 08:28 XioNoX: remove down sessions to AS6327
* 08:27 XioNoX: remove down sessions to AS8674
* 08:25 XioNoX: remove down sessions to AS24429
* 08:21 XioNoX: remove down sessions to AS16509
* 06:59 _joe_: rolling restart of php7.2-fpm on the codfw jobrunners, to reduce the number of dangling transcodes after restarting cp-jobqueue for a deploy
* 06:59 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:16 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=jobrunner,dc=codfw,name=mw224.*
* 06:15 oblivian@cumin2001: conftool action : set/pooled=no; selector: cluster=videoscaler,dc=codfw,name=mw228.*
* 06:10 marostegui: Warm up tables [[phab:T261914|T261914]]


== 2020-10-25 ==
== 2021-10-03 ==
* 15:53 dwisehaupt: kernel upgrade and reboot for frdb1003
* 14:45 _joe_: restarting acmechief on acmechief1001
* 15:50 dwisehaupt: kernel upgrade and reboot for fran1001
* 12:55 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1127, bad ram', diff saved to https://phabricator.wikimedia.org/P17414 and previous config saved to /var/cache/conftool/dbconfig/20211003-125530-kormat.json
* 08:24 elukey: powercycle cp5006 (unresponsive to ssh, remote tty available but not able to login as root, no prometheus metrics in hours)
* 08:23 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5006.eqsin.wmnet


== 2020-10-23 ==
== 2021-10-02 ==
* 22:56 mutante: added Nuria to "nda" LDAP group - leaving her in "wmf" until the actual last day - shell account remains so no puppet change needed in ldap_only_admins ([[phab:T266086|T266086]])
* 17:28 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:10 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 15:37 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:04 ema: rolling thumbor-instances restart to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/636012/ [[phab:T266155|T266155]]
* 12:47 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'eventrouter' .
* 10:57 kormat: uploaded orchestrator v3.2.3 to apt.wikimedia.org buster-wikimedia - [[phab:T266023|T266023]] (forgot to log this earlier)
* 10:56 volans: uploaded python3-wmflib_0.0.3 to apt.wikimedia.org buster-wikimedia - [[phab:T257905|T257905]]
* 10:09 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-2
* 09:51 moritzm: masking slapd on the old Stretch replicas to uncover potential direct access outside of the LVSes  [[phab:T264388|T264388]]
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:32 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:31 jayme: published docker-registry.discovery.wmnet/eventrouter:0.3.0-1
* 09:26 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:23 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:09 volans: upgrading spicerack to 0.0.44 on cumin hosts - [[phab:T257905|T257905]]


== 2020-10-22 ==
== 2021-10-01 ==
* 22:42 mutante: ganeti1001 - adding 2 more vcpus to VM testreduce1001 - [[phab:T257940|T257940]]
* 23:19 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 22:03 mutante: deploy1002 - armed keyholder, all deployment keys loaded [[phab:T265963|T265963]]
* 22:27 mutante: puppetmaster2001 - systemctl reset-failed
* 21:56 mutante: deploy1002 - scap pull  and added to mediawiki-installation "dsh" group - will be part of scap trains but just like any appserver ([[phab:T265963|T265963]])
* 22:16 mutante: puppetmaster2001 systemctl disable geoip_update_ipinfo.timer
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:15 mutante: puppetmaster2001 - sudo /usr/local/bin/geoipupdate_job after adding new shell command and timer - succesfully downloaded enterprise database for [[phab:T288844|T288844]]
* 20:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:56 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 19:13 mutante: deploy1002 currently cloning ALL the deployment repos - new setup
* 21:44 mutante: puppetmasters - temp. disabling puppet one more time, now for a different deploy, to fetch an additional MaxMind database - [[phab:T288844|T288844]]
* 18:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:19 mutante: puppetmaster2001 - puppet removed cron sync_volatile and cron sync_ca - starting and verifying new timers: 'systemctl status sync-puppet-volatile', 'systemctl status sync-puppet-ca' [[phab:T273673|T273673]]
* 18:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:12 mutante: puppetmaster1002, puppetmaster1003, puppetmaster2002, puppetmaster2003: re-enabled puppet, they are backends. backends don't have the sync cron/job/timer, so noop as well, just like 1004/1005/2004/2005. this just leaves the actual change on 2001  - [[phab:T273673|T273673]]
* 18:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:07 mutante: puppetmaster1004, puppetmaster1005, puppetmaster2004, puppetmaster2005: re-enabled puppet, they are "insetup" role
* 18:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:06 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend (duration: 00m 54s)
* 18:54 mutante: applying deployment_server role to new server deploy1002 - might show up in monitoring but is not prod yet, deploy1001 still is
* 21:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@d309a6e] (eqiad): tegola: reduce load to 50% during the weekend
* 18:34 mutante: adding mcrouter cert for deploy1002.eqiad.wmnet [[phab:T265963|T265963]]
* 21:05 mutante: puppetmaster1001 - re-enabled puppet, noop as expected, the passive host pulls from the active one, so only 2001 has the cron/job/timer
* 18:12 dpifke@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Expand  to group1 ([[phab:T123582|T123582]]) (duration: 00m 56s)
* 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 volans: cumin 'A:dns-rec' 'rec_control wipe-cache wikimedia.org$' - [[phab:T258729|T258729]]
* 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 chaomodus: Updating eqiad public network DNS to automation
* 21:01 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Revert "Have PdfHandler use Shellbox on Commons for 10% of requests" (duration: 00m 59s)
* 17:50 volans: cumin 'A:dns-rec' 'rec_control wipe-cache eqiad.wmnet$' - [[phab:T258729|T258729]]
* 20:58 mutante: temp disabling puppet on puppetmasters - deploying gerrit:724115 (gerrit:723310) [[phab:T273673|T273673]]
* 17:49 elukey: add thirdparty/bigtop14 to buster-wikimedia
* 18:58 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 17:46 chaomodus: Updating eqiad private network DNS to automation
* 18:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 17:21 bd808@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: REIMAGE
* 17:21 bd808@cumin1001: Added views for new wiki: smnwiki [[phab:T264900|T264900]]
* 18:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: REIMAGE
* 17:07 bd808@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:07 robh@cumin1001: END (ERROR) - Cookbook sre.experimental.reimage (exit_code=97) for host an-db1001.eqiad.wmnet
* 16:46 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:05 robh@cumin1001: START - Cookbook sre.experimental.reimage for host an-db1001.eqiad.wmnet
* 16:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:58 effie: depool mw1025, mw1319, mw1312 for test
* 14:56 moritzm: installing remaining mariadb-10.3 updates for buster (as packaged in Debian, not the wmf-mariadb package)
* 16:20 dancy: testing upcoming Scap 4.0.2 release on beta
* 14:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:04 bblack: C:envoyproxy (appservers and others): restarting envoyproxy
* 14:33 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 14:04 bblack: C:envoyproxy (appservers and others): ca-certificates updated via cumin to workaround [[phab:T292291|T292291]] issues
* 14:13 andrewbogott: upgrading mariadb on cloudcontrol1003, 1004, 1005
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:05 ottomata: bump camus version to wmf12 for all camus jobs.  should be no-op now. - [[phab:T251609|T251609]]
* 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for all eventgate-analytics-external bound streams - [[phab:T251609|T251609]] (duration: 01m 02s)
* 13:23 bblack: manually trying LE expired root workaround on mwdebug1001 with puppet disabled ...
* 13:55 moritzm: depooling ldap-eqiad-replica01/ldap-eqiad-replica02 [[phab:T264388|T264388]]
* 13:12 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:41 moritzm: pooling ldap-replica1001/1002 [[phab:T264388|T264388]]
* 13:11 gehel@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 13:10 moritzm: depooling ldap-replica2001/2002 [[phab:T264388|T264388]]
* 13:11 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.14
* 13:10 gehel@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:01 moritzm: pooling ldap-replica2004 [[phab:T264388|T264388]]
* 11:42 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:24 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventStreams: Enable canary events for 3 eventgate-analytics bound streams - [[phab:T251609|T251609]] (duration: 01m 05s)
* 11:11 jynus: manually migrating some vms out of ganeti1009 to avoid excessive memory pressure
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|52ad2d4df1164dced684231c12aa64bd028b8ac9}}: Do not log logins at loginwiki via CU ([[phab:T253802|T253802]]) (duration: 01m 06s)
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17413 and previous config saved to /var/cache/conftool/dbconfig/20211001-105849-root.json
* 12:03 Urbanecm: [urbanecm@deploy1001 /srv/mediawiki-staging (master * u=)]$ sudo /usr/local/sbin/fix-staging-perms
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17412 and previous config saved to /var/cache/conftool/dbconfig/20211001-105735-root.json
* 11:59 Lucas_WMDE: EU backport&config window done
* 10:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad (duration: 00m 49s)
* 11:58 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 2/2 (duration: 01m 04s)
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17411 and previous config saved to /var/cache/conftool/dbconfig/20211001-104345-root.json
* 11:57 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:635762{{!}}Enable propagatePageDeletion on Test Wikidata]], 1/2 (duration: 01m 02s)
* 10:43 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d4caf6d] (eqiad): Increase mirrored traffic to 100% for eqiad
* 11:54 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` in a tmux session updateVarDumps at mwmaint2001 (wiki=huwiki; [[phab:T246539|T246539]])
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17410 and previous config saved to /var/cache/conftool/dbconfig/20211001-104232-root.json
* 11:39 moritzm: restarting nginx on acmechief*, debmonitor*, schema*, puppetdb* to pick up freetype update
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17409 and previous config saved to /var/cache/conftool/dbconfig/20211001-102841-root.json
* 11:38 marostegui: Compare s1-s8 tables - [[phab:T261914|T261914]]
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17408 and previous config saved to /var/cache/conftool/dbconfig/20211001-102728-root.json
* 11:33 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17407 and previous config saved to /var/cache/conftool/dbconfig/20211001-101338-root.json
* 11:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: Config: [[gerrit:635813{{!}}Add ary, avk, awa, lld, shy and smn to InterwikiSortOrders.php]] (duration: 01m 08s)
* 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17406 and previous config saved to /var/cache/conftool/dbconfig/20211001-101224-root.json
* 11:31 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 10:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad (duration: 00m 51s)
* 11:25 moritzm: restarting apache and smokeping* on netmon* to pick up freetype update
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@c123ab9] (eqiad): Increase mirrored traffic to 80% for eqiad
* 11:21 moritzm: correction: installing freetype security updates for buster (stretch TBD)
* 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17405 and previous config saved to /var/cache/conftool/dbconfig/20211001-095834-root.json
* 10:43 moritzm: installing freetype security updates for stretch/buster
* 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17404 and previous config saved to /var/cache/conftool/dbconfig/20211001-095720-root.json
* 10:33 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:55 marostegui: Upgrade db1164 and db1177
* 10:27 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177 and db1164 for upgrade', diff saved to https://phabricator.wikimedia.org/P17403 and previous config saved to /var/cache/conftool/dbconfig/20211001-095433-marostegui.json
* 09:38 arturo: merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/634050 change to network data yaml
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17402 and previous config saved to /var/cache/conftool/dbconfig/20211001-094913-root.json
* 08:31 kormat: enabling replication from eqiad to codfw [[phab:T261914|T261914]]
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17401 and previous config saved to /var/cache/conftool/dbconfig/20211001-094902-root.json
* 08:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=frwiki --force # to get an idea about timing for [[phab:T290609|T290609]], runs in a tmux session under my account
* 08:23 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17400 and previous config saved to /var/cache/conftool/dbconfig/20211001-093410-root.json
* 07:52 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17399 and previous config saved to /var/cache/conftool/dbconfig/20211001-093358-root.json
* 03:37 eileen: civicrm revision changed from {{Gerrit|4dce7bf535}} to {{Gerrit|bb7c08bf6d}}, config revision is {{Gerrit|9a522d03dd}}
* 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 03:13 eileen: civicrm revision changed from {{Gerrit|3c3dcf80ae}} to {{Gerrit|4dce7bf535}}, config revision is {{Gerrit|9a522d03dd}}
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17398 and previous config saved to /var/cache/conftool/dbconfig/20211001-091906-root.json
* 01:12 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@870829c]: 0.3.52 (duration: 09m 07s)
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17397 and previous config saved to /var/cache/conftool/dbconfig/20211001-091854-root.json
* 01:04 ryankemper: Tests passing on canary `wdqs1003`, proceeding with wdqs deploy for rest of fleet
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17396 and previous config saved to /var/cache/conftool/dbconfig/20211001-090402-root.json
* 01:03 ryankemper@deploy1001: Started deploy [wdqs/wdqs@870829c]: 0.3.52
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17395 and previous config saved to /var/cache/conftool/dbconfig/20211001-090351-root.json
* 09:02 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 09:00 _joe_: restarting pybal low-traffic in eqiad to pick up the drop of proxyfetch to kubernetes services
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17394 and previous config saved to /var/cache/conftool/dbconfig/20211001-084859-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17393 and previous config saved to /var/cache/conftool/dbconfig/20211001-084847-root.json
* 08:44 marostegui: Upgrade db1135 and db1172
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172 for upgrade', diff saved to https://phabricator.wikimedia.org/P17392 and previous config saved to /var/cache/conftool/dbconfig/20211001-084435-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for upgrade', diff saved to https://phabricator.wikimedia.org/P17391 and previous config saved to /var/cache/conftool/dbconfig/20211001-084411-marostegui.json
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080 [[phab:T290868|T290868]]', diff saved to https://phabricator.wikimedia.org/P17390 and previous config saved to /var/cache/conftool/dbconfig/20211001-084345-marostegui.json
* 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
* 08:15 _joe_: restarting pybal in codfw to pick up config changes
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on testvm[2001,2003].codfw.wmnet with reason: Ganeti tests
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17388 and previous config saved to /var/cache/conftool/dbconfig/20211001-062846-root.json
* 06:27 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17387 and previous config saved to /var/cache/conftool/dbconfig/20211001-062453-root.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17386 and previous config saved to /var/cache/conftool/dbconfig/20211001-061342-root.json
* 06:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17385 and previous config saved to /var/cache/conftool/dbconfig/20211001-060949-root.json
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17384 and previous config saved to /var/cache/conftool/dbconfig/20211001-055838-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17383 and previous config saved to /var/cache/conftool/dbconfig/20211001-055445-root.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17382 and previous config saved to /var/cache/conftool/dbconfig/20211001-054335-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17381 and previous config saved to /var/cache/conftool/dbconfig/20211001-053942-root.json
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17380 and previous config saved to /var/cache/conftool/dbconfig/20211001-052831-root.json
* 05:26 marostegui: Upgrade db1114
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 for upgrade', diff saved to https://phabricator.wikimedia.org/P17379 and previous config saved to /var/cache/conftool/dbconfig/20211001-052509-marostegui.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17378 and previous config saved to /var/cache/conftool/dbconfig/20211001-052438-root.json
* 05:22 marostegui: Upgrade db1119
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for upgrade', diff saved to https://phabricator.wikimedia.org/P17377 and previous config saved to /var/cache/conftool/dbconfig/20211001-052133-marostegui.json
* 04:00 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Have PdfHandler use Shellbox on Commons for 10% of requests ([[phab:T289228|T289228]]) (duration: 00m 59s)
* 04:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:24 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 03:15 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .


== 2020-10-21 ==
== 2021-09-30 ==
* 23:16 catrope@deploy1001: Synchronized php-1.36.0-wmf.14/extensions/GrowthExperiments/: [[phab:T266033|T266033]] (duration: 01m 05s)
* 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 catrope@deploy1001: Synchronized php-1.36.0-wmf.13/extensions/GrowthExperiments/: [[phab:T265751|T265751]] [[phab:T265754|T265754]] (duration: 01m 08s)
* 23:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:38 mutante: testreduce1001 assigned 2 more GBs of RAM - rebooting ([[phab:T257940|T257940]], [[phab:T257906|T257906]])
* 23:51 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Put a https protocol into values (duration: 01m 00s)
* 19:44 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 23:48 dpifke@deploy1002: Finished deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 00m 05s)
* 19:15 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T264963|T264963]])
* 23:48 dpifke@deploy1002: Started deploy [statsv/statsv@afeff42]: Deploy statsv with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 18:13 Urbanecm: Morning B&C window done
* 23:41 dpifke@deploy1002: Finished deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 01m 07s)
* 18:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|45312d359442d274e83deb7be80f86e12fb9e864}}: [WikibaseMediaInfo] Fix concept chips array nesting structure ([[phab:T256431|T256431]]) (duration: 01m 05s)
* 23:40 dpifke@deploy1002: Started deploy [performance/coal@1be49f8]: Deploy Coal with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 18:12 mepps: updated payments-wiki-staging from {{Gerrit|db03677b2d}} to {{Gerrit|5fdd29bc16}}
* 23:39 dpifke@deploy1002: Finished deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]] (duration: 00m 05s)
* 18:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d94e33ff39b300c74fcaf08d1746c089fb1af783}}: cirrus: Hardcode more_like to codfw cirrus cluster (duration: 01m 05s)
* 23:39 dpifke@deploy1002: Started deploy [performance/navtiming@29264fb]: Deploy Navtiming with Kafka TLS support (not yet enabled) [[phab:T290131|T290131]]
* 17:56 XioNoX: configure FB PNI in eqdfw
* 23:34 ejegg: updated Fundraising CiviCRM from {{Gerrit|d4da344274}} to {{Gerrit|d74e9aa0a1}}
* 17:43 ppchelko@deploy1001: Synchronized php-1.36.0-wmf.14/skins/WikimediaApiPortal: Backport gerrit:635329, [[phab:T266021|T266021]] (duration: 01m 06s)
* 22:09 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 17:34 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch ParserCache to JSON on testwiki gerrit:635382 (duration: 01m 05s)
* 22:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 17:24 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 08s)
* 22:06 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' .
* 17:21 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable ParserCache logger for warn+, gerrit:635071 (duration: 01m 06s)
* 21:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:06 eileen: civicrm revision changed from {{Gerrit|2ecb8f0bcd}} to {{Gerrit|d4da344274}}, config revision is {{Gerrit|77cb7ec866}}
* 17:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:54 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo pool` (merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/725110 to unbreak readiness probe)
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:54 topranks: Routinator on rpki1001 upgraded to  0.10.0 and working again after force refresh.
* 16:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:49 brennen: gitlab1001: upgrade to 14.2.5 complete
* 16:57 mutante: scandium - disabling puppet so that Parsoid team can make some tests on testreduce1001 today
* 20:32 brennen: gitlab2001, gitlab1001: downtime for upgrades to 14.2.5
* 16:46 effie: restart php-fpm and pool mw2252 and mw2328
* 20:18 ryankemper: [WCQS] `ryankemper@wcqs1003:~$ sudo depool` (not sure why pybal can't depool it, the other 2 servers are pooled)
* 15:58 Lucas_WMDE: Deployed patch for [[phab:T260349|T260349]]
* 19:51 topranks: Updating routinator on rpki1001 [[phab:T291543|T291543]]
* 15:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:39 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 15:33 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:31 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:37 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' .
* 15:28 moritzm: updating prometheus-openldap-exporter to 0+git20171128-3 to buster-wikimedia
* 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 jbond42: upgrade puppetlabs-stdlib to 6.5.0 https://gerrit.wikimedia.org/r/c/operations/puppet/+/634278
* 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 moritzm: imported prometheus-openldap-exporter 0+git20171128-3 to buster-wikimedia [[phab:T264388|T264388]]
* 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 otto@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster (duration: 02m 56s)
* 19:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]]
* 15:01 crusnov@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:00 otto@deploy1001: Started deploy [analytics/refinery@e4d16f0] (hadoop-test): deploying with updated camus to test cluster
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 crusnov@cumin1001: START - Cookbook sre.dns.netbox
* 19:07 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/MobileFrontend: Backport: [[gerrit:724979{{!}}Fix search within pages alignment (T292107)]] (duration: 01m 09s)
* 14:44 reedy@deploy1001: Synchronized wmf-config/wikitech.php: Set CURLOPT_RETURNTRANSFER true in gerrit handler [[phab:T242554|T242554]] (duration: 01m 07s)
* 19:05 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/EventBus/includes/EventBus.php: Backport: [[gerrit:724481{{!}}Guard against undefined index notice when setting x-client-ip (T288853)]] (duration: 01m 09s)
* 14:34 dcausse: restarting blazegraph on codfw servers ([[phab:T263952|T263952]])
* 19:04 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/EventBus/includes/EventBus.php: Backport: [[gerrit:724480{{!}}Guard against undefined index notice when setting x-client-ip (T288853)]] (duration: 01m 09s)
* 13:21 moritzm: pooling ldap-replica2003 [[phab:T264388|T264388]]
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:04 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.14 (duration: 01m 04s)
* 18:58 thcipriani@deploy1002: Synchronized php-1.38.0-wmf.2/skins/Vector/resources/skins.vector.styles.legacy/components/MenuDropdown.less: Backport: [[gerrit:724798{{!}}Restore original more menu padding in legacy Vector (T289163)]] (duration: 01m 08s)
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.14
* 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:40 matthiasmullie: EU B&C done
* 18:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [WikibaseMediaInfo] Add config for related terms API (duration: 01m 04s)
* 18:43 thcipriani@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|785404fa2b998947d236aebe481ee1abcbd14220}}: Disable registrations stat on Special:TranslationStats ([[phab:T264158|T264158]]) (duration: 01m 05s)
* 18:42 moritzm: imported gitlab 14.2.5 to thirdparty/gitlab [[phab:T292219|T292219]]
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11567427c3f7d2908b29046ee56a7b0c0da32c09}}: Enable ContentTranslation in 5 Wikipedias as a default tool ([[phab:T264737|T264737]]; [[phab:T264738|T264738]]; [[phab:T264739|T264739]]; [[phab:T264740|T264740]]; [[phab:T264741|T264741]]) (duration: 01m 30s)
* 18:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:00 marostegui: Upgrade db2093's mariadb version [[phab:T266003|T266003]]
* 18:38 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] Part III (duration: 01m 07s)
* 10:58 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:37 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania-wordmark.svg: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] Part II (duration: 01m 07s)
* 10:56 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 18:35 thcipriani@deploy1002: Synchronized static/images/mobile/copyright/wikimania.svg: Config: [[gerrit:704167{{!}}Use Wikimania's logo in a new vector (T286405)]] part I (duration: 01m 07s)
* 10:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=rowiki; [[phab:T246539|T246539]])
* 18:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:37 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:01 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=srwiki; [[phab:T246539|T246539]])
* 18:31 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:724514{{!}}Enable sticky header on beta cluster (T289721)]] (duration: 01m 08s)
* 10:00 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 18:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 09:59 vgutierrez: Bump ECDHE-ECDSA-AES128-SHA pageview replacement to 100% - [[phab:T258405|T258405]]
* 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:42 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=nowiki; [[phab:T246539|T246539]])
* 18:27 otto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thorium.eqiad.wmnet
* 09:42 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 18:22 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:38 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=shwiki; [[phab:T246539|T246539]])
* 18:20 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724861{{!}}Disable legacy media dom on a few more wikis (T51097)]] (duration: 01m 08s)
* 09:37 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=warwiki; [[phab:T246539|T246539]]
* 18:07 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 09:30 Urbanecm: End of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:23 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:49 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:22 root@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 17:49 otto@cumin1001: START - Cookbook sre.hosts.decommission for hosts thorium.eqiad.wmnet
* 09:21 aborrero@cumin2001: START - Cookbook sre.hosts.downtime
* 17:42 bstorm: updating packages for thirdparty/kubeadm-k8s-1-20 and thirdparty/kubeadm-k8s-1-19 in stretch-wikimedia on apt1001 [[phab:T292131|T292131]]
* 08:52 Urbanecm: Start of `mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log` (wiki=viwiki; [[phab:T246539|T246539]])
* 17:09 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 55s)
* 08:50 Urbanecm: mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=$wiki --print-orphaned-records-to=/tmp/urbanecm/$wiki-orphaned.log --progress-markers > $wiki.log # wiki=cebwiki; [[phab:T246539|T246539]]
* 17:08 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 08:46 Urbanecm: [urbanecm@mwmaint2001 ~/updateVarDumps/output/group2-medium/output]$ mwscript extensions/AbuseFilter/maintenance/updateVarDumps.php --wiki=apiportalwiki # [[phab:T246539|T246539]]
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 08s)
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:02 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 08:38 root@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 17:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 08:38 root@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:00 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad (duration: 00m 11s)
* 08:33 godog: swift codfw-prod: bump object weight for ms-be2057 - [[phab:T261633|T261633]]
* 17:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@8fbf87c] (eqiad): Increase mirrored traffic to 50% for eqiad
* 08:10 XioNoX: Upgrade Routinator 3000 to 0.8.0 on rpki1001 - [[phab:T266001|T266001]]
* 16:49 sukhe: restart dnsdist.service on doh[1001-1002,2001-2002,3001-3002,4001-4002,5001-5002].wikimedia.org
* 08:09 XioNoX: add Routinator 3000 0.8.0 to apt - [[phab:T266001|T266001]]
* 16:43 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10% (duration: 02m 33s)
* 07:58 elukey: update analytics-in4 filter on cr1/cr2-eqiad for https://gerrit.wikimedia.org/r/635319
* 16:40 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22]: Increase mirrored traffic to 10%
* 04:35 ryankemper: re-enabled icinga notifications on all wdqs hosts now that `wdqs-updater` is healthy
* 16:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 40s)
* 16:37 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
* 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 hnowlan: Ran `GRANT pg_monitor TO prometheus` for maps in eqiad and codfw to fix empty prometheus connection metrics
* 16:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10% (duration: 00m 16s)
* 16:30 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a4d22] (eqiad): Increase mirrored traffic to 10%
* 16:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725032{{!}}Disable jQuery migrate in metawiki (T280944)]] (duration: 01m 09s)
* 16:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:725019{{!}}Enable dispatching via job to 10 prod wikis]] (duration: 01m 09s)
* 15:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
* 14:36 elukey: drop /etc/helmfile-defaults/private/backup_old_paths from deploy1002 (old data not needed anymore)
* 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17374 and previous config saved to /var/cache/conftool/dbconfig/20210930-143325-root.json
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17373 and previous config saved to /var/cache/conftool/dbconfig/20210930-143044-root.json
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17372 and previous config saved to /var/cache/conftool/dbconfig/20210930-141822-root.json
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17370 and previous config saved to /var/cache/conftool/dbconfig/20210930-141540-root.json
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17369 and previous config saved to /var/cache/conftool/dbconfig/20210930-140318-root.json
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17368 and previous config saved to /var/cache/conftool/dbconfig/20210930-140037-root.json
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17367 and previous config saved to /var/cache/conftool/dbconfig/20210930-134815-root.json
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17366 and previous config saved to /var/cache/conftool/dbconfig/20210930-134533-root.json
* 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 13:40 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 13:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 13:37 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:36 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17365 and previous config saved to /var/cache/conftool/dbconfig/20210930-133311-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17364 and previous config saved to /var/cache/conftool/dbconfig/20210930-133029-root.json
* 13:29 marostegui: Upgrade db1111
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for upgrade', diff saved to https://phabricator.wikimedia.org/P17363 and previous config saved to /var/cache/conftool/dbconfig/20210930-132831-marostegui.json
* 13:27 marostegui: Upgrade db1134
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 (s1) for upgrade', diff saved to https://phabricator.wikimedia.org/P17362 and previous config saved to /var/cache/conftool/dbconfig/20210930-132700-marostegui.json
* 13:26 marostegui: Upgrade db1133
* 13:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 13:02 urbanecm: Start server-side upload for 2 video files ([[phab:T292096|T292096]], [[phab:T291492|T291492]])
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17361 and previous config saved to /var/cache/conftool/dbconfig/20210930-130116-root.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17360 and previous config saved to /var/cache/conftool/dbconfig/20210930-130109-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17359 and previous config saved to /var/cache/conftool/dbconfig/20210930-124612-root.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17358 and previous config saved to /var/cache/conftool/dbconfig/20210930-124606-root.json
* 12:31 Reedy: downloading files for [[phab:T290900|T290900]] in screen on mwmaint1002
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17357 and previous config saved to /var/cache/conftool/dbconfig/20210930-123109-root.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17356 and previous config saved to /var/cache/conftool/dbconfig/20210930-123101-root.json
* 12:18 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 17s)
* 12:18 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:17 moritzm: adapted MX records to point to both mx1001.wikimedia.org and mx2001.wikimedia.org with equal weights [[phab:T286911|T286911]]
* 12:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 16s)
* 12:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17355 and previous config saved to /var/cache/conftool/dbconfig/20210930-121605-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17354 and previous config saved to /var/cache/conftool/dbconfig/20210930-121558-root.json
* 12:14 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 15s)
* 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 10s)
* 12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:10 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors (duration: 00m 01s)
* 12:10 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@35b9174]: tegola: remove mirror_threshold variable because of parsing errors
* 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17353 and previous config saved to /var/cache/conftool/dbconfig/20210930-120102-root.json
* 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17352 and previous config saved to /var/cache/conftool/dbconfig/20210930-120054-root.json
* 12:00 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:58 hnowlan: imported wikidiff2_1.13.0-1/php-wikidiff2_1.13.0-1_amd64.deb to buster-wikimedia component/php72
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1 and s8) for upgrade', diff saved to https://phabricator.wikimedia.org/P17351 and previous config saved to /var/cache/conftool/dbconfig/20210930-115631-marostegui.json
* 11:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 03s)
* 11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:47 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
* 11:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 01s)
* 11:46 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 11:44 effie: downgrading scap to 3.17.1-1 on maps* hosts - [[phab:T291990|T291990]]
* 11:43 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724732{{!}}Make reply tool available as opt-out almost everywhere (phase 3) (T288485)]] (duration: 01m 07s)
* 11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:35 kartik@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/DiscussionTools: Backport: [[gerrit:724789{{!}}Add a link to preferences within the Reply and New Discussion Tools (T291002)]] (duration: 01m 08s)
* 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:30 kartik@deploy1002: Synchronized php-1.38.0-wmf.1/extensions/DiscussionTools: Backport: [[gerrit:724788{{!}}Add a link to preferences within the Reply and New Discussion Tools (T291002)]] (duration: 01m 09s)
* 11:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:14 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:724458{{!}}Enable SectionTranslation in Igbo, Hausa, Yoruba Wikipedias (T290175)]] (duration: 01m 08s)
* 11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 11:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:13 akosiaris: upgrade znuny to 6.0.37
* 10:06 godog: test bounce logstash on logstash1023
* 08:21 moritzm: installing nettle security updates on stretch
* 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2003.codfw.wmnet
* 07:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2003.codfw.wmnet
* 07:31 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin (duration: 00m 06s)
* 07:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: tegola: use eqiad discovery endpoin
* 07:03 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:58 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:56 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' .
* 06:48 marostegui: Deploy schema change on s8 codfw (lag will show up) [[phab:T270620|T270620]]
* 06:01 marostegui: Deploy schema change on s1 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:53 marostegui: Deploy schema change on s3 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:52 marostegui: Deploy schema change on s7 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:47 marostegui: Deploy schema change on s5 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:45 marostegui: Deploy schema change on s4 codfw (lag will show up) [[phab:T270620|T270620]]
* 05:45 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T270620|T270620]]


== 2020-10-20 ==
== 2021-09-29 ==
* 22:10 dwisehaupt: frmon2001 upgraded to buster with grafana 7.2.1
* 23:20 bd808@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:19 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:05 bd808@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 21:18 cdanis: ✔️ cdanis@mw2252.codfw.wmnet ~ 🕠🍺 sudo depool
* 23:02 bd808@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'toolhub' for release 'main' .
* 20:57 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 00m 08s)
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:56 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0] (thin): Regular analytics weekly train THIN [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:39 cdanis: doing some manual testing on mw2221, depooled and puppet disabled
* 21:57 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Catch TimelineException from fixMap() ([[phab:T292126|T292126]]) (duration: 01m 07s)
* 20:33 mforns@deploy1001: Finished deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54] (duration: 08m 10s)
* 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:31 ryankemper: [Temporarily] disabled notifications for all wdqs hosts while we figure out how to unstick the updater process. Impact is that new updates will be delayed, but queries will still keep serving as normal, so fixing this is a priority but note that there's no availability outage
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 21:37 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/includes/Timeline.php: Bump Timeline::CACHE_VERSION (duration: 01m 08s)
* 20:25 mforns@deploy1001: Started deploy [analytics/refinery@e4d16f0]: Regular analytics weekly train [analytics/refinery@e4d16f08a96b6f65447fcdc6c9e8945724a89f54]
* 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:22 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]] (duration: 01m 08s)
* 20:06 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:21 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.2  refs [[phab:T281166|T281166]]
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 20:16 jhuneidi@deploy1002: Finished scap: Fix pywikibot feature detection (duration: 13m 38s)
* 19:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:02 jhuneidi@deploy1002: Started scap: Fix pywikibot feature detection
* 19:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid,service=canary
* 19:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:24 razzi@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:06 legoktm@deploy1002: Synchronized php-1.38.0-wmf.2/extensions/timeline/scripts/renderTimeline.sh: Fix passing temp directory to EasyTimeline.pl (duration: 01m 07s)
* 18:56 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:48 effie: depooling mw2328 - [[phab:T266052|T266052]]
* 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:52 dancy@deploy1002: Synchronized php-1.38.0-wmf.2/skins/MinervaNeue/resources/skins.minerva.base.styles/ui.less: Backport: [[gerrit:724787{{!}}Search header should be vertically centered, not top aligned(take 2) (T292071)]] (duration: 01m 08s)
* 17:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
*