You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(166 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-01-29 ==
== 2021-08-03 ==
* 23:26 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:36 dancy@deploy1001: Finished scap: MW servers complaining about l10n files after .27 rollback (duration: 07m 22s)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 dancy@deploy1001: Started scap: MW servers complaining about l10n files after .27 rollback
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:26 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:20 reedy@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: CacheTime: Extra protection for rollback unserialization [[phab:T273007|T273007]] (duration: 01m 00s)
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:14 dancy@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:09 dancy@deploy1001: scap failed: average error rate on 8/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 21:42 razzi: rebalance kafka partitions for codfw.resource_change
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 21:40 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 19:26 razzi@cumin1001: END (FAIL) - Cookbook sre.kafka.reboot-workers (exit_code=99) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 19:26 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 18:50 hashar: CI slightly overloaded due to a surge of library updates but is otherwise processing changes
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:31 reedy@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.config.js: [[phab:T273231|T273231]] (duration: 01m 02s)
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:56 effie: depool mw1403 and mw1405
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:46 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-presto1001.eqiad.wmnet
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 15:27 elukey@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-presto1001.eqiad.wmnet
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:58 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 14:56 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1007.eqiad.wmnet with reason: REIMAGE
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 13:50 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 13:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 13:48 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 13:47 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 13:16 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 13:05 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 12:38 hnowlan: uploaded osmborder_0.1.0-2~buster0 package to buster-wikimedia
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 12:00 gilles@deploy1001: Finished deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events (duration: 00m 06s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:00 gilles@deploy1001: Started deploy [performance/coal@b0d3b59]: [[phab:T271208|T271208]] Filter out canary events
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 11:42 dcausse@deploy1001: Synchronized wmf-config/unitConversionConfig.json: [[phab:T270252|T270252]]: Update unitConversionConfig.json (duration: 01m 01s)
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 11:39 gilles@deploy1001: Finished deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check (duration: 00m 05s)
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 11:39 gilles@deploy1001: Started deploy [performance/navtiming@ae8310a]: [[phab:T271208|T271208]] Fix canary event check
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 11:26 gilles@deploy1001: Finished deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field (duration: 00m 06s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 11:26 gilles@deploy1001: Started deploy [performance/navtiming@e7712c3]: [[phab:T271208|T271208]] Log instead of hard error on missing wiki field
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 11:06 gilles@deploy1001: Finished deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events (duration: 00m 05s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 11:06 gilles@deploy1001: Started deploy [performance/navtiming@125f6be]: [[phab:T271208|T271208]] Ignore canary events
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 11:04 elukey: upload presto-* version 0.246-1 packages to buster/stretch-wikimedia
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 10:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:45 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14050 and previous config saved to /var/cache/conftool/dbconfig/20210129-103505-root.json
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14049 and previous config saved to /var/cache/conftool/dbconfig/20210129-102001-root.json
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 10:18 vgutierrez: pool cp5006
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 10:17 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 16:59 hashar: Gerrit has been upgraded
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14048 and previous config saved to /var/cache/conftool/dbconfig/20210129-100458-root.json
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 09:51 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 09:50 vgutierrez: reboot cp5006
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14047 and previous config saved to /var/cache/conftool/dbconfig/20210129-094954-root.json
* 16:45 hashar: Stopping Gerrit for upgrade
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1078 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14046 and previous config saved to /var/cache/conftool/dbconfig/20210129-093451-root.json
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 09:32 marostegui: Expand lvs on db1155-db1175 [[phab:T258361|T258361]]
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 09:31 vgutierrez: depool cp5006
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 08:20 marostegui: Change buffer pool sizes on clouddb1013,1015,1017,1019 [[phab:T267090|T267090]]
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 07:11 marostegui: Upgrade pc2007 to 10.4.18 [[phab:T268457|T268457]]
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175', diff saved to https://phabricator.wikimedia.org/P14044 and previous config saved to /var/cache/conftool/dbconfig/20210129-065529-marostegui.json
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 03:35 marostegui: Reload haproxy1018
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2251.codfw.wmnet
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:42 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2252.codfw.wmnet
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2252.codfw.wmnet
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:37 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2251.codfw.wmnet
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 krinkle@deploy1001: Synchronized wmf-config/profiler.php: {{Gerrit|If0c71a983772c}} (duration: 00m 58s)
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 01:48 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2252.codfw.wmnet with reason: REIMAGE
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 01:46 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2251.codfw.wmnet with reason: REIMAGE
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2253.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 01:07 mutante: repooled mw2248,mw2249 - jobrunners/videoscalers now on buster
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 01:06 mutante: repooled mw2048,mw2049 - jobrunners/videoscalers now on buster
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:06 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2253.codfw.wmnet
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2249.codfw.wmnet
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 01:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2248.codfw.wmnet
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2249.codfw.wmnet
* 12:47 moritzm: restarting Tomcat on idp1001
* 01:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2248.codfw.wmnet
* 12:05 moritzm: installing libgcrypt20 security updates
* 00:19 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2261.codfw.wmnet
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 00:14 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2262.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 00:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2283.codfw.wmnet
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2021-01-28 ==
== 2021-08-02 ==
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2261.codfw.wmnet
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:58 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2262.codfw.wmnet
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:57 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2283.codfw.wmnet
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2253.codfw.wmnet with reason: REIMAGE
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2248.codfw.wmnet with reason: REIMAGE
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2249.codfw.wmnet with reason: REIMAGE
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:34 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2283.codfw.wmnet with reason: reimaging
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 23:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 21:31 tzatziki: removing 1 file for legal compliance
* 23:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 21:16 tzatziki: removing 7 files for legal compliance
* 23:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2283.codfw.wmnet with reason: REIMAGE
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2262.codfw.wmnet with reason: REIMAGE
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2261.codfw.wmnet with reason: REIMAGE
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:14 mutante: reimaging jobrunners/videoscallers mw2248,mw2249
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 22:43 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/includes/parser/CacheTime.php: [[gerrit:658688{{!}}CacheTime: Extra protection for rollback unserialization (T273007)]] (duration: 00m 57s)
* 19:00 urbanecm: Morning B&C window completed
* 22:41 bblack: eqiad lvs should be back to normal state now with everything working
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 22:39 bblack: lvs1014 - apply https://gerrit.wikimedia.org/r/659439
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 22:37 bblack: lvs1013 - testing https://gerrit.wikimedia.org/r/659439 (expect nop, worked on 1015!)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:36 bblack: lvs1015 - testing https://gerrit.wikimedia.org/r/659439 (expect nop)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:21 bblack: lvs1016 - trying https://gerrit.wikimedia.org/r/659439 on backup LVS...
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2287.codfw.wmnet
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2286.codfw.wmnet
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2285.codfw.wmnet
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 22:20 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2284.codfw.wmnet
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:16 bblack: disabling puppet on all eqiad lvs for https://gerrit.wikimedia.org/r/659439 risks
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2284.codfw.wmnet
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 22:03 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2285.codfw.wmnet
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2286.codfw.wmnet
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 22:02 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2287.codfw.wmnet
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 21:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: REIMAGE
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 21:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: REIMAGE
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.28
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2287.codfw.wmnet with reason: reimaging
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 21:27 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2285.codfw.wmnet with reason: reimaging
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 21:27 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 21:23 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2284.codfw.wmnet with reason: REIMAGE
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2285.codfw.wmnet with reason: REIMAGE
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2287.codfw.wmnet with reason: REIMAGE
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2286.codfw.wmnet with reason: REIMAGE
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 21:19 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.28 (duration: 01m 05s)
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 21:17 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.28
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 21:15 brennen: 1.36.0-wmf.28 train status ([[phab:T271342|T271342]]): blockers resolved, going go group1 to be follow shortly by all wikis
* 12:20 mutante: gerrit servers: disabling puppet
* 21:11 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CentralAuth/includes/: Backport: [[gerrit:659362{{!}}Revert CentralAuthCreateLocalAccountJob changes in 9f79de4 (T273205)]] (duration: 01m 09s)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 20:49 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/tests/phpunit/includes/parser/ParserOptionsTest.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 07s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 20:46 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/parser/ParserOptions.php: Backport: [[gerrit:659103{{!}}Make ParserOptions::isSafeToCache more robust (T273120)]] (duration: 01m 08s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 20:25 bblack: lvs1014,lvs1016 - all back to "normal" state
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 20:24 bblack: lvs1014 - restart pybal
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 20:20 bblack: lvs1016 - restart pybal
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 20:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables (duration: 01m 44s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 20:13 bblack: lvs1014,lvs1016 - puppet temporarily disabled for new service config deploy - [[phab:T271476|T271476]]
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2223.codfw.wmnet
* 11:27 hashar: restarting Jenkins on contint2001
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2247.codfw.wmnet
* 11:27 hashar: restarting Jenkins on contint1001
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@911731d]: write articletopic and drafttopic to hourly tables
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:13 mutante: scap pulling and repooling: mw1264, mw2223, mw2247
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 20:11 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1019.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:10 bstorm@cumin1001: conftool action : set/pooled=yes; selector: name=dbproxy1018.eqiad.wmnet
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2223.codfw.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2247.codfw.wmnet
* 11:13 urbanecm: EU B&C window completed
* 20:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 19:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 11:08 moritzm: installing openjdk-11 security updates
* 19:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 19:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier (duration: 01m 09s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 19:52 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ba1acd6]: airflow: start ores_predictions_daily one day earlier
* 07:24 moritzm: installing libsndfile security updates on buster
* 19:45 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --add-prefix=BROKEN --fix ([[phab:T271939|T271939]])
* 07:12 moritzm: installing aspell security updates
* 19:44 Urbanecm: Run mwscript namespaceDupes.php --wiki=frwikisource --fix ([[phab:T271939|T271939]])
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0ae49093893316657ffd7cf56669a470fb073352}}: frwikisource: Add WS as an alias to NS_PROJECT ([[phab:T271939|T271939]]) (duration: 00m 57s)
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fd18092fd8b73414f6c320895601c83b883e29ee}}: Add image.laji.fi to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T270587|T270587]]) (duration: 01m 04s)
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 19:36 jynus: extending backup1001 /dev/mapper/array1-archive partition to allocate enough space for helium backups [[phab:T238048|T238048]]
* 19:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|519350b86bd4afc8d4efc3c2f9b2631a0ced22c2}}: frwiktionary: Change babel category names per community request ([[phab:T270186|T270186]]) (duration: 00m 59s)
* 19:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3d0ca3a11a59063e5adfc126702032ea357e8524}}: Create patroller user group for thwiki ([[phab:T272149|T272149]]) (duration: 01m 07s)
* 19:20 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 19:19 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 00m 08s)
* 19:19 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60] (thin): Regular analytics weekly train THIN [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 19:15 mforns@deploy1001: Finished deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562] (duration: 16m 53s)
* 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e914f1e65adfdf2f41af97363501b0ba3c40d5b8}}: robots: cawikimedia: Set wgDefaultRobotPolicy to noindex,nofollow ([[phab:T272871|T272871]]) (duration: 01m 08s)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:10 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables (duration: 01m 25s)
* 19:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2247.codfw.wmnet with reason: REIMAGE
* 19:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0742443]: hourly partitioning for ores tables
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2223.codfw.wmnet with reason: REIMAGE
* 19:07 cdanis: decom Zayo IP transit on cr2-codfw [[phab:T272675|T272675]]
* 19:06 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for mediawiki_revision_recommendation_create (duration: 01m 12s)
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 19:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 18:58 cdanis: draining traffic from Zayo OGYX/123447 codfw<>ulsfo in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 18:58 mforns@deploy1001: Started deploy [analytics/refinery@1e41f60]: Regular analytics weekly train [analytics/refinery@1e41f608fad96e7a9f77eb28cd1c082a0a01d562]
* 18:58 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Remove [[phab:T257687|T257687]] mitigations (duration: 01m 10s)
* 18:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1159.eqiad.wmnet with reason: REIMAGE
* 18:34 mutante: reimaging another canary appserver, mw1264, so that we will have at least 2 stretch and 2 buster canaries for the transitional period
* 18:30 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
* 17:49 jgleeson: fundraising-tools tools updated from {{Gerrit|41cab089da}} to {{Gerrit|d64b2f8cee}}
* 17:38 crusnov@deploy1001: Finished deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]] (duration: 01m 18s)
* 17:37 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:35 crusnov@deploy1001: Started deploy [netbox/deploy@52d6fb9]: Test deploy of 2.10.4 to netbox-next [[phab:T265084|T265084]]
* 17:28 ebernhardson: ban elastic1063 from production-search-omega-eqiad and production-search-eqiad [[phab:T265113|T265113]]
* 17:11 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 06s)
* 16:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:51 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:49 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 16:48 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 16:45 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=99) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:44 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:41 arturo: running homer on cr*-eqiad* again for reverting latest changes ([[phab:T271476|T271476]])
* 16:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:28 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:26 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:25 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:24 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:24 akosiaris: stop scraping apertium from prometheus, it doesn't have a prometheus endpoint.
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'production' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'plain' .
* 16:23 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'apertium' for release 'staging' .
* 16:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:17 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:03 arturo: running homer on cr*-eqiad* for [[phab:T271476|T271476]]
* 15:55 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:54 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:52 cdanis: draining traffic from Zayo OGYX/120003 codfw<>eqiad in preparation for decommission 🥃 [[phab:T272675|T272675]]
* 15:49 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days (duration: 01m 15s)
* 15:49 marostegui: Power off clouddb1019 for memory replacement [[phab:T272125|T272125]]
* 15:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d0a6933]: align threshold path references across days
* 15:25 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate NavigationTiming schemas to Event Platform on all wikis - [[phab:T271208|T271208]] (duration: 01m 11s)
* 15:06 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 15:05 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:26 jayme@deploy1001: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:14 jayme@deploy1001: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148 after kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14039 and previous config saved to /var/cache/conftool/dbconfig/20210128-141425-marostegui.json
* 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14038 and previous config saved to /var/cache/conftool/dbconfig/20210128-135730-marostegui.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14037 and previous config saved to /var/cache/conftool/dbconfig/20210128-135612-root.json
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14036 and previous config saved to /var/cache/conftool/dbconfig/20210128-135602-root.json
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14035 and previous config saved to /var/cache/conftool/dbconfig/20210128-134109-root.json
* 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14034 and previous config saved to /var/cache/conftool/dbconfig/20210128-134057-root.json
* 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14033 and previous config saved to /var/cache/conftool/dbconfig/20210128-132605-root.json
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14032 and previous config saved to /var/cache/conftool/dbconfig/20210128-132553-root.json
* 13:17 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14031 and previous config saved to /var/cache/conftool/dbconfig/20210128-131101-root.json
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14030 and previous config saved to /var/cache/conftool/dbconfig/20210128-131050-root.json
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Remove es1024's weight', diff saved to https://phabricator.wikimedia.org/P14029 and previous config saved to /var/cache/conftool/dbconfig/20210128-125631-marostegui.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14028 and previous config saved to /var/cache/conftool/dbconfig/20210128-125558-root.json
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14027 and previous config saved to /var/cache/conftool/dbconfig/20210128-125546-root.json
* 12:48 dcausse: European mid-day backport window done
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 100%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14026 and previous config saved to /var/cache/conftool/dbconfig/20210128-123800-root.json
* 12:32 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 01m 09s)
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 80%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14025 and previous config saved to /var/cache/conftool/dbconfig/20210128-122256-root.json
* 12:22 marostegui: Reboot db1146:3312 db1146:3314
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312, db1146:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P14024 and previous config saved to /var/cache/conftool/dbconfig/20210128-122118-marostegui.json
* 12:12 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T271493|T271493]]: [cirrus] set 50kb limit on file text indexing for commons (duration: 01m 09s)
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 70%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14023 and previous config saved to /var/cache/conftool/dbconfig/20210128-120752-root.json
* 12:07 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T266027|T266027]]: [cirrus] Swith to perfield builder for spaceless languages (duration: 01m 06s)
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 50%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14022 and previous config saved to /var/cache/conftool/dbconfig/20210128-115249-root.json
* 11:45 gilles@deploy1001: Finished deploy [performance/navtiming@446e5df]: (no justification provided) (duration: 00m 05s)
* 11:45 gilles@deploy1001: Started deploy [performance/navtiming@446e5df]: (no justification provided)
* 11:37 vgutierrez: upgrade pybal to 1.15.9 in esams
* 11:30 elukey: disable nginx proxy buffering on archiva.wikimedia.org for a perf test - [[phab:T252767|T252767]]
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 30%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14020 and previous config saved to /var/cache/conftool/dbconfig/20210128-112242-root.json
* 11:21 vgutierrez: upgrade pybal to 1.15.9 in eqiad
* 11:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'es1025 (re)pooling @ 10%: Repooling after the restart', diff saved to https://phabricator.wikimedia.org/P14019 and previous config saved to /var/cache/conftool/dbconfig/20210128-110739-root.json
* 11:04 marostegui: Restart mysql on es1025  [[phab:T266483|T266483]]
* 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1025 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14018 and previous config saved to /var/cache/conftool/dbconfig/20210128-110353-marostegui.json
* 11:01 _joe_: restarting php-fpm on the appserver,api and jobrunner clusters in eqiad, 10% at a time, for simulating scap rolling restarts [[phab:T266055|T266055]]
* 10:52 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool es5 on writes [[phab:T266483|T266483]] (duration: 01m 05s)
* 10:46 marostegui: Restart mysql on es1024  [[phab:T266483|T266483]]
* 10:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool es5 from writes [[phab:T266483|T266483]] (duration: 01m 09s)
* 10:33 _joe_: performing a test-run of the rolling restart of php-fpm in codfw, using the same code scap will use [[phab:T266055|T266055]]. Starting from the api cluster, then proceeding whith others
* 10:15 _joe_: upgrading pybal on lvs2008
* 10:11 _joe_: upgrading pybal on lvs2009
* 10:10 vgutierrez: upgrade pybal to 1.15.9 in eqsin
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14017 and previous config saved to /var/cache/conftool/dbconfig/20210128-095642-root.json
* 09:48 _joe_: upgrading pybal to 1.15.9 in codfw, starting from lvs2010
* 09:47 jbond42: upload new cas package to apt
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 80%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14016 and previous config saved to /var/cache/conftool/dbconfig/20210128-094139-root.json
* 09:30 _joe_: upgrading pybal on lvs4006
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 70%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14015 and previous config saved to /var/cache/conftool/dbconfig/20210128-092635-root.json
* 09:25 _joe_: upgrading pybal on lvs4005
* 09:11 _joe_: installing pybal 1.15.9 on lvs4007
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14014 and previous config saved to /var/cache/conftool/dbconfig/20210128-091131-root.json
* 09:08 moritzm: installing perf updates on Stretch
* 09:06 marostegui: Testing wikitech
* 09:00 _joe_: uploading pybal 1.15.9 to apt.wikimedia.org
* 08:58 moritzm: installing perf updates on Buster
* 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14013 and previous config saved to /var/cache/conftool/dbconfig/20210128-085627-root.json
* 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14012 and previous config saved to /var/cache/conftool/dbconfig/20210128-084123-root.json
* 08:34 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14011 and previous config saved to /var/cache/conftool/dbconfig/20210128-083347-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14010 and previous config saved to /var/cache/conftool/dbconfig/20210128-083337-root.json
* 08:32 vgutierrez: pool cp1087 - [[phab:T273153|T273153]]
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 30%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14009 and previous config saved to /var/cache/conftool/dbconfig/20210128-082620-root.json
* 08:20 vgutierrez: restart purged on cp1087 - [[phab:T273153|T273153]]
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14008 and previous config saved to /var/cache/conftool/dbconfig/20210128-081843-root.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14007 and previous config saved to /var/cache/conftool/dbconfig/20210128-081834-root.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14006 and previous config saved to /var/cache/conftool/dbconfig/20210128-081116-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14005 and previous config saved to /var/cache/conftool/dbconfig/20210128-080340-root.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14004 and previous config saved to /var/cache/conftool/dbconfig/20210128-080330-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 15%: Pooling for the first time very slowly', diff saved to https://phabricator.wikimedia.org/P14003 and previous config saved to /var/cache/conftool/dbconfig/20210128-075613-root.json
* 07:54 moritzm: installing tomcat9 security updates
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14002 and previous config saved to /var/cache/conftool/dbconfig/20210128-074836-root.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P14001 and previous config saved to /var/cache/conftool/dbconfig/20210128-074827-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14000 and previous config saved to /var/cache/conftool/dbconfig/20210128-073426-marostegui.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13999 and previous config saved to /var/cache/conftool/dbconfig/20210128-073333-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13998 and previous config saved to /var/cache/conftool/dbconfig/20210128-073323-root.json
* 07:25 elukey: powercycle cp1087 (after depooling it)
* 07:24 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1087.eqiad.wmnet
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13997 and previous config saved to /var/cache/conftool/dbconfig/20210128-072154-marostegui.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13996 and previous config saved to /var/cache/conftool/dbconfig/20210128-072120-marostegui.json
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1169 some more minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13995 and previous config saved to /var/cache/conftool/dbconfig/20210128-072036-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to s1 for the first time, with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13994 and previous config saved to /var/cache/conftool/dbconfig/20210128-063806-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1169 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13993 and previous config saved to /var/cache/conftool/dbconfig/20210128-063655-marostegui.json
* 03:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 03:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2291.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2290.codfw.wmnet
* 02:13 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2288.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2290.codfw.wmnet
* 02:05 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2291.codfw.wmnet
* 02:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 02:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:35 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:35 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:33 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2291.codfw.wmnet with reason: REIMAGE
* 01:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2288.codfw.wmnet with reason: reimaging
* 01:31 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2290.codfw.wmnet with reason: REIMAGE
* 01:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2288.codfw.wmnet with reason: REIMAGE
* 01:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 01:10 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2294.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2293.codfw.wmnet
* 01:09 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2292.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2294.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2293.codfw.wmnet
* 00:56 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2292.codfw.wmnet
* 00:50 Urbanecm: Evening B&C done
* 00:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87c304c5439b1b7898f951db61d0a0a8a11ee4f7}}: Disable max-width on page namespace for wikisource ([[phab:T260091|T260091]]; 2nd take) (duration: 01m 00s)
* 00:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
* 00:41 foks: reset email for User:Uwe Martens
* 00:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
* 00:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.wmnet
* 00:33 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/includes/: {{Gerrit|c5c39ba8b3fce3f946e161191b814446aa5c1f4b}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 04s)
* 00:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:32 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2293.codfw.wmnet with reason: reimaging
* 00:31 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/includes/: {{Gerrit|a67fe4f7cbf172b82153aaceaa93a067cdff2ae4}}: Fix fetching ipblock-exempt within BlockManager::getUserBlock ([[phab:T271551|T271551]], [[phab:T270145|T270145]]) (duration: 01m 07s)
* 00:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:26 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/GrowthExperiments/includes/HomepageModules/BaseModule.php: {{Gerrit|5417e0c8518b54144b99c963a1bbff3d15a00b32}}: Fix BaseModule::BASE_CSS_CLASS visibility ([[phab:T273099|T273099]]) (duration: 01m 00s)
* 00:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2292.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2293.codfw.wmnet with reason: REIMAGE
* 00:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2294.codfw.wmnet with reason: REIMAGE
* 00:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 00:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1404.eqiad.wmnet with reason: REIMAGE
* 00:12 urbanecm@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty


== 2021-01-27 ==
== 2021-07-31 ==
* 23:30 shdubsh: reboot logstash2006
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 22:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2246.codfw.wmnet
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2246.codfw.wmnet
* 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2222.codfw.wmnet
* 22:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2222.codfw.wmnet
* 22:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1405.eqiad.wmnet
* 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1405.eqiad.wmnet
* 21:57 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 21:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday (duration: 02m 23s)
* 21:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae24e12]: repoint ores thresholds to yesterday
* 21:17 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task (duration: 07m 54s)
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily task
* 21:09 ebernhardson@deploy1001: deploy aborted: airflow: hourly tasks must wait for yesterdays daily tank (duration: 00m 00s)
* 21:09 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1c9d487]: airflow: hourly tasks must wait for yesterdays daily tank
* 20:58 brennen@deploy1001: Synchronized php-1.36.0-wmf.28/includes/libs/objectcache/RedisBagOStuff.php: Backport: [[gerrit:658780{{!}}objectcache: fix broken for loop in RedisBagOStuff::doSetMulti() (T273006)]] (duration: 01m 07s)
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2246.codfw.wmnet with reason: REIMAGE
* 20:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 20:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2222.codfw.wmnet with reason: REIMAGE
* 20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2299.codfw.wmnet
* 20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2217.codfw.wmnet
* 20:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2217.codfw.wmnet
* 20:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2221.codfw.wmnet
* 20:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2221.codfw.wmnet
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 20:30 brennen: 1.36.0-wmf.28 ([[phab:T271342|T271342]]): taking over train while dancy is afk; waiting on [[gerrit:658939]] to merge and will sync for verification on testwikis
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1405.eqiad.wmnet with reason: REIMAGE
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2216.codfw.wmnet
* 20:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2218.codfw.wmnet
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2219.codfw.wmnet
* 20:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2216.codfw.wmnet
* 20:07 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 05s)
* 20:06 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|6c5dd65e6138eb32db8059720a2149d4728763e7}}: Undeploy cswiki birthday logo (duration: 01m 06s)
* 20:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2218.codfw.wmnet
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2219.codfw.wmnet
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 19:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2221.codfw.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|53419ab6c0f2c306a68edb8979106bd42536211a}}: arwiki: Configure wgGEHomepageManualAssignmentMentorsList ([[phab:T273060|T273060]]) (duration: 00m 59s)
* 19:19 elukey: reboot an-launcher1002 for kernel upgrades
* 19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cabb2e2009f97bb86c1b8827c3cc61cc991c41a9}}: Declare 6 more NavigationTiming eventlogging streams and migrate on testwiki ([[phab:T271208|T271208]]) (duration: 01m 00s)
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9382a9879bd6823fd664c0d3721fd0a9dc0d56d8}}: Migrate WebUIActionsTracking schemas to Event Platform on all wikis ([[phab:T267347|T267347]],[[phab:T271164|T271164]]) (duration: 01m 03s)
* 19:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2215.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2215.codfw.wmnet
* 18:50 mutante: testreduce1001 - making nginx listen on IPv6 and restarting it [[phab:T266509|T266509]]
* 18:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 18:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 18:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2219.codfw.wmnet with reason: REIMAGE
* 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 18:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2218.codfw.wmnet with reason: REIMAGE
* 18:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2217.codfw.wmnet with reason: REIMAGE
* 18:30 Tchanders: Creating the table securepoll_log in votewiki and testwiki ([[phab:T271270|T271270]])
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 07s)
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 18:25 hashar@deploy1001: Finished deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4 (duration: 00m 10s)
* 18:25 hashar@deploy1001: Started deploy [integration/docroot@da43ad4]: Add Shellbox to doc.wm.o , misc build related changes fdf0917..da43ad4
* 18:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 18:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 18:15 dpifke@deploy1001: Finished deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002 (duration: 00m 05s)
* 18:15 dpifke@deploy1001: Started deploy [performance/arc-lamp@e24f319]: Re-deploying ArcLamp to webperf1002
* 18:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2301.codfw.wmnet
* 18:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1406.eqiad.wmnet
* 18:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 18:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1407.eqiad.wmnet
* 18:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2216.codfw.wmnet with reason: REIMAGE
* 18:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1407.eqiad.wmnet
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2301.codfw.wmnet
* 18:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1406.eqiad.wmnet
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 17:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2215.codfw.wmnet with reason: REIMAGE
* 17:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 17:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2301.codfw.wmnet with reason: REIMAGE
* 17:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 17:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 17:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1406.eqiad.wmnet with reason: REIMAGE
* 17:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: REIMAGE
* 17:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 17:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 16:54 elukey@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 16:40 elukey@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 16:38 jayme@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 16:21 jayme@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 16:18 moritzm: installing python-bottle security updates
* 15:42 elukey: umount /var/hadoop/data/r on an-worker1099 and restart hadoop daemons - [[phab:T273034|T273034]]
* 15:31 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on group0 and group1 - [[phab:T271208|T271208]] (duration: 01m 07s)
* 15:15 godog: bounce rsyslog on centrallog1001
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:52 kharlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 13:48 kharlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:43 kharlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:25 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:20 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13989 and previous config saved to /var/cache/conftool/dbconfig/20210127-123300-root.json
* 12:25 awight: EU bacon done
* 12:25 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658594{{!}}Enable bracket matching on the first wikis (T270238)]] (duration: 01m 07s)
* 12:20 awight@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CodeMirror: Backport: [[gerrit:658814{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 06s)
* 12:19 awight@deploy1001: Synchronized php-1.36.0-wmf.28/extensions/CodeMirror: Backport: [[gerrit:658815{{!}}Improve matchbrackets performance when moving the cursor (T270317)]] (duration: 01m 14s)
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13988 and previous config saved to /var/cache/conftool/dbconfig/20210127-121756-root.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13987 and previous config saved to /var/cache/conftool/dbconfig/20210127-120253-root.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13986 and previous config saved to /var/cache/conftool/dbconfig/20210127-114749-root.json
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: After upgrading the kernel', diff saved to https://phabricator.wikimedia.org/P13985 and previous config saved to /var/cache/conftool/dbconfig/20210127-113245-root.json
* 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade and enablement of report_host', diff saved to https://phabricator.wikimedia.org/P13984 and previous config saved to /var/cache/conftool/dbconfig/20210127-105735-marostegui.json
* 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
* 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with final weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13982 and previous config saved to /var/cache/conftool/dbconfig/20210127-102042-marostegui.json
* 10:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
* 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
* 10:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
* 10:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
* 10:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
* 10:05 elukey: reboot matomo1002 for kernel upgrades
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13981 and previous config saved to /var/cache/conftool/dbconfig/20210127-100220-marostegui.json
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13980 and previous config saved to /var/cache/conftool/dbconfig/20210127-093802-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13979 and previous config saved to /var/cache/conftool/dbconfig/20210127-091909-marostegui.json
* 09:04 jbond42: deploy fix to enable-puppet
* 09:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with more weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13978 and previous config saved to /var/cache/conftool/dbconfig/20210127-083618-marostegui.json
* 08:29 marostegui: Stop mysql on db1089 to clone db1169 [[phab:T258361|T258361]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 to clone db1169 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13976 and previous config saved to /var/cache/conftool/dbconfig/20210127-082826-marostegui.json
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13975 and previous config saved to /var/cache/conftool/dbconfig/20210127-081150-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13974 and previous config saved to /var/cache/conftool/dbconfig/20210127-080753-marostegui.json
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13973 and previous config saved to /var/cache/conftool/dbconfig/20210127-080645-root.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13972 and previous config saved to /var/cache/conftool/dbconfig/20210127-075715-marostegui.json
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13971 and previous config saved to /var/cache/conftool/dbconfig/20210127-075142-root.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13970 and previous config saved to /var/cache/conftool/dbconfig/20210127-073638-root.json
* 07:26 elukey: powercycle analytics1073 - kernel soft lock up bug registered, os needs a reboot
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving clouddb replicas', diff saved to https://phabricator.wikimedia.org/P13969 and previous config saved to /var/cache/conftool/dbconfig/20210127-072135-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13968 and previous config saved to /var/cache/conftool/dbconfig/20210127-070502-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13967 and previous config saved to /var/cache/conftool/dbconfig/20210127-065715-marostegui.json
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1160 some more small weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13966 and previous config saved to /var/cache/conftool/dbconfig/20210127-063930-marostegui.json
* 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1160 with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13965 and previous config saved to /var/cache/conftool/dbconfig/20210127-061336-marostegui.json
* 06:03 twentyafterfour: phabricator appears to be up and running fine
* 06:03 twentyafterfour: phabricator is read-write
* 06:01 twentyafterfour: phabricator is read-only
* 06:00 marostegui: m3 master restart, phabricator will go on read only - [[phab:T272596|T272596]]
* 05:50 marostegui: Deploy schema change on s3 [[phab:T270055|T270055]]
* 03:48 ryankemper: (Restarted `wdqs-blazegraph` on `wdqs1012`)
* 02:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021 (duration: 02m 59s)
* 02:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9c85a21]: transfer_to_es: start date 2020 -> 2021
* 01:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 01:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 01:56 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@6c6b2cb]: 0.3.61 (duration: 07m 50s)
* 01:50 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.61` on canary `wdqs1003`; proceeding to rest of fleet
* 01:48 ryankemper@deploy1001: Started deploy [wdqs/wdqs@6c6b2cb]: 0.3.61
* 01:48 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.61`. Pre-deploy tests passing on canary `wdqs1003`
* 01:39 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup (duration: 01m 11s)
* 01:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ee948e0]: transfer_to_es: Enable catchup
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2296.codfw.wmnet
* 01:25 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2295.codfw.wmnet
* 01:24 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Roll-out complete. Will monitor `wdqs-internal` for any issues. All the remaining `WDQS SPARQL` alerts should clear shortly
* 01:21 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Test queries to `wdqs1003.eqiad.wmnet` passed, and metrics in Grafana (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=1611706751381&to=1611710190405) look good. Rolling out to rest of fleet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2296.codfw.wmnet
* 01:20 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2295.codfw.wmnet
* 01:14 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps (duration: 03m 31s)
* 01:10 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@246b640]: remove link recommendations from hourly transfer deps
* 00:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:52 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:51 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Fixed typo in private key in commit `ea152df802b55e939d34494a4965ed83a80a24f2`. Puppet run on `wdqs1003` was successful as a result. Monitoring...
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2295.codfw.wmnet with reason: REIMAGE
* 00:49 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2296.codfw.wmnet with reason: REIMAGE
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Discovered source of the above failure; the secret key in the puppetmaster `/srv/private` repo has a typo in its name (my error): it had `wqds` instead of `wdqs`. Opening up a patch now
* 00:45 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:36 ryankemper: [Deploy envoy for `wdqs-internal`] `...Error while evaluating a Function Call, secret(): invalid secret ssl/wdqs-internal.discovery.wmnet.key (file: /etc/puppet/modules/sslcert/manifests/certificate.pp, line: 91, column: 26) (file: /etc/puppet/modules/profile/manifests/tlsproxy/envoy.pp, line: 129) on node wdqs1003.eqiad.wmnet`
* 00:20 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Disabled puppet on all `wdqs-internal` hosts; merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657913
* 00:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper: [[phab:T272713|T272713]] [Deploy envoy for `wdqs-internal`] Downtimed all `wdqs-internal` hosts on icinga
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:15 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal
* 00:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: Enabling envoy for wdqs-internal


== 2021-01-26 ==
== 2021-07-30 ==
* 23:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2297.codfw.wmnet
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2298.codfw.wmnet
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2302.codfw.wmnet
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1264.eqiad.wmnet
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2297.codfw.wmnet
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1264.eqiad.wmnet
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 23:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2298.codfw.wmnet
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2299.codfw.wmnet
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 23:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2302.codfw.wmnet
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:35 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly (duration: 01m 07s)
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:34 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a276626]: correct execution_date_fn in ores_predictions_hourly
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:30 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2300.codfw.wmnet
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:27 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2300.codfw.wmnet
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 22:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1264.eqiad.wmnet with reason: REIMAGE
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2297.codfw.wmnet with reason: REIMAGE
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 22:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2298.codfw.wmnet with reason: REIMAGE
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 22:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2299.codfw.wmnet with reason: REIMAGE
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 21:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 21:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2300.codfw.wmnet with reason: REIMAGE
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 21:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2306.codfw.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2304.codfw.wmnet
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 21:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2304.codfw.wmnet
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2306.codfw.wmnet
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 21:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2302.codfw.wmnet with reason: REIMAGE
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 21:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1337.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw13388.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 21:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 21:31 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table (duration: 03m 09s)
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 21:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2308.codfw.wmnet
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 21:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2308.codfw.wmnet
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 21:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@a87a69a]: correct alter table syntax to create wbitem table
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2304.codfw.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 21:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 21:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 21:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2306.codfw.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 21:06 ebernhardson: restart airflow-scheduler and airflow-webserver on an-airflow1001 post-deploy
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 21:05 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations (duration: 08m 30s)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 20:57 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@2662ca2]: ship hourly link recommendations
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 20:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 20:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate 5 NavigationTiming schemas to Event Platform on testwiki - [[phab:T271208|T271208]] (duration: 01m 17s)
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 20:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 20:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 20:52 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 20:52 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Beginning decommission of `relforge1002`: `sudo -i cookbook sre.hosts.decommission relforge1002.eqiad.wmnet -t [[phab:T272444|T272444]]`
* 11:23 moritzm: installing libsndfile security updates on stretch
* 20:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2308.codfw.wmnet with reason: REIMAGE
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 20:50 dancy: group0 rolled back to 1.36.0-wmf.27
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 20:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 20:50 dancy@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 20:47 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 20:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 20:42 dancy@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.28
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 20:40 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Beginning decommission of `relforge1001`: `sudo -i cookbook sre.hosts.decommission relforge1001.eqiad.wmnet -t [[phab:T272444|T272444]]`
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 20:40 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 20:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission
* 20:37 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/657453 prior to running decom cookbook
* 20:36 ryankemper: [[phab:T272444|T272444]] (Decommission relforge100[1,2]) Downtimed `relforge100[1,2]` in Icinga cookbook for the next 26 hours
* 20:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 20:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 20:13 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet
* 20:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
* 20:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1409.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1408.eqiad.wmnet
* 19:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1408.eqiad.wmnet
* 19:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2317.codfw.wmnet
* 19:49 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2317.codfw.wmnet
* 19:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1409.eqiad.wmnet
* 19:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1337.eqiad.wmnet
* 19:18 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
* 19:16 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2317.codfw.wmnet with reason: REIMAGE
* 19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2321.codfw.wmnet
* 18:58 moritzm: installing sudo security updates on Jessie
* 18:57 moritzm: uploaded sudo 1.8.10p3-1+deb8u7+wmf1 to apt.wikimedia.org
* 18:46 dancy@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.28 (duration: 40m 09s)
* 18:37 moritzm: installing sudo security updates on Stretch
* 18:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
* 18:22 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming after rebuild
* 18:15 moritzm: installing sudo security updates on Buster
* 18:07 dancy@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.28
* 17:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 17:19 mutante: ms-be1028 - running puppet to clear ferm icinga alert
* 17:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
* 17:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1337.eqiad.wmnet with reason: REIMAGE
* 17:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2321.codfw.wmnet with reason: REIMAGE
* 16:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 16:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: REIMAGE
* 16:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1409.eqiad.wmnet with reason: REIMAGE
* 16:50 marostegui: Deploy schema change on testwiki - [[phab:T272953|T272953]]
* 16:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:43 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:42 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:42 mutante: reimaginge l33t jobrunner mw1337
* 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 16:38 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:02 moritzm: installing mutt security updates on buster
* 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 14:56 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: REIMAGE
* 14:44 hnowlan: reimaging maps1009 as new buster master
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'production' .
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 14:23 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 14:22 akosiaris: restart pybal on lvs1015, lvs1016, lvs2009, lvs2010 for picking up linkrecommendation, similar-users, apertium-tls LVS services.
* 14:21 marostegui: Install mariadb 10.4.18 on pc2010 - [[phab:T268457|T268457]]
* 14:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:07 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:05 marostegui: Restart db1077
* 14:03 godog: swift codfw-prod decrease SSD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:41 arturo: admin update some kubernetes-related packages in buster-wikimedia/thirdparty/kubeadm-k8s-1-17 ([[phab:T263284|T263284]])
* 13:30 hashar: Upgraded and restarting Jenkins on release1002 / releases2002 / contint1001 and contint2001
* 12:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=zhwiki --fix # [[phab:T271612|T271612]] # P13960
* 12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11cfef4f05612771d6a7cbe27f9bb1fbb41e0e5d}}: Add WikiProject and WikiProject_talk namespace and its aliases for zh.wikipedia ([[phab:T271612|T271612]]) (duration: 01m 01s)
* 12:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|080389dbac5bb2cddab7640071e43674a868e945}}: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage ([[phab:T272776|T272776]]; 2/2) (duration: 01m 02s)
* 12:24 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikivoyage-wordmark-tr.svg: {{Gerrit|080389dbac5bb2cddab7640071e43674a868e945}}: Add localized Wikivoyage wordmark for the mobile view of Turkish Wikivoyage ([[phab:T272776|T272776]]; 1/2) (duration: 01m 01s)
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dfc28a4a759050726561da861a9e1030b529d3e}}: Add Turkish Powered by MediaWiki and A Wikimedia project icons for Turkish Wikivoyage ([[phab:T272781|T272781]]) (duration: 01m 00s)
* 12:12 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=trwikivoyage --cluster=all
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eab535fcc983d57dd36c41309162ace8aadcae1a}}: Add namespace aliases to Turkish Wikivoyage ([[phab:T272782|T272782]]) (duration: 01m 00s)
* 11:47 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 11:46 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 11:44 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 11:29 moritzm: imported jenkins 2.263.3 to apt.wikimedia.org (thirdparty/ci)
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 09:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 09:37 elukey: reboot dbstore1005 for kernel upgrades
* 09:34 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: Resync: Some mw2xxx hosts have old version (duration: 00m 55s)
* 09:32 godog: disable mdadm check emails on ms-be1022 / known, and host is going to be decom'd - [[phab:T267870|T267870]]
* 09:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T272957|T272957]]
* 09:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Restart mariadb to pick up config changes [[phab:T272957|T272957]]
* 09:28 elukey: reboot dbstore1003 for kernel upgrades
* 09:24 urbanecm@deploy1001: Synchronized wmf-config/logos.php: Resyncing to fix mw2xxx apache loading (duration: 00m 57s)
* 09:14 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 09:14 elukey: reboot dbstore1004 for kernel upgrades
* 09:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eab87780}}: frwiki: Fix tagline height and width ([[phab:T272907|T272907]]) (duration: 00m 58s)
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078 (db1175 isn't ready yet)', diff saved to https://phabricator.wikimedia.org/P13959 and previous config saved to /var/cache/conftool/dbconfig/20210126-091236-marostegui.json
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to clone db1175 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P13958 and previous config saved to /var/cache/conftool/dbconfig/20210126-091149-marostegui.json
* 09:06 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 08:53 marostegui: Stop mysql on db1081 to clone db1160
* 08:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
* 08:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
* 08:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1119,1131].eqiad.wmnet
* 08:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host furud.codfw.wmnet
* 08:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1119,1131].eqiad.wmnet
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host furud.codfw.wmnet
* 08:31 godog: swift start decom for ms-be20[17,19,21,23,24,25,26,27] - [[phab:T272837|T272837]]
* 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: REIMAGE
* 08:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1119.eqiad.wmnet with reason: REIMAGE
* 08:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 08:18 moritzm: upgrading OpenJDK on aqs and Hadoop systems
* 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 (s4 old master) - [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13955 and previous config saved to /var/cache/conftool/dbconfig/20210126-070443-marostegui.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 master and remove read-only from s4 [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13954 and previous config saved to /var/cache/conftool/dbconfig/20210126-070152-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s4 as read-only for maintenance [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13953 and previous config saved to /var/cache/conftool/dbconfig/20210126-070037-marostegui.json
* 07:00 marostegui: Starting s4 eqiad failover from db1081 to db1138 - [[phab:T271427|T271427]]
* 06:55 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1005` - its blazegraph was deadlocked (based on the presence of null values for the blazegraph metrics for that host)
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Set candidate master to weight 0 before the failover [[phab:T271427|T271427]]', diff saved to https://phabricator.wikimedia.org/P13952 and previous config saved to /var/cache/conftool/dbconfig/20210126-054337-marostegui.json
* 00:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
* 00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2318.codfw.wmnet
* 00:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2319.codfw.wmnet
* 00:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2320.codfw.wmnet
* 00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
* 00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2318.codfw.wmnet
* 00:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2319.codfw.wmnet
* 00:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2320.codfw.wmnet
* 00:34 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Invalidate configuration cache when logos.php is touched too (duration: 00m 56s)
* 00:32 legoktm@deploy1001: Synchronized wmf-config/logos.php: Add script to mostly automate logo management (duration: 00m 55s)
* 00:16 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split $wmgSiteLogo<nowiki>{</nowiki>1,1_5,2<nowiki>}</nowiki>x to a separate logos.php (1/2) (duration: 01m 00s)
* 00:14 legoktm@deploy1001: Synchronized wmf-config/logos.php: Split $wmgSiteLogo<nowiki>{</nowiki>1,1_5,2<nowiki>}</nowiki>x to a separate logos.php (1/2) (duration: 00m 56s)
* 00:08 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T272920|T272920]]: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (2/2) (duration: 00m 58s)
* 00:07 legoktm@deploy1001: Synchronized static/favicon/arbcom_enwiki.ico: [[phab:T272920|T272920]]: arbcom_enwiki: Change favicon to a renamed copy of arbcom_ruwiki.ico (1/2) (duration: 01m 00s)


== 2021-01-25 ==
== 2021-07-29 ==
* 23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: REIMAGE
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2319.codfw.wmnet with reason: REIMAGE
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 23:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2320.codfw.wmnet with reason: REIMAGE
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1338.eqiad.wmnet
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2322.codfw.wmnet
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 22:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2322.codfw.wmnet
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2323.codfw.wmnet
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 22:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1338.eqiad.wmnet
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 21:45 cstone: civicrm revision changed from {{Gerrit|3afb54f6f9}} to {{Gerrit|dfb2ea2148}}
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 21:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 21:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 21:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1338.eqiad.wmnet with reason: REIMAGE
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 20:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2326.codfw.wmnet
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 20:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 20:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1410.eqiad.wmnet
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 20:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1410.eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 20:35 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2323.codfw.wmnet with reason: REIMAGE
* 14:11 vgutierrez: restart pybal on lvs2009
* 20:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2322.codfw.wmnet with reason: REIMAGE
* 14:09 vgutierrez: restart pybal on lvs2010
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 14:07 vgutierrez: restart pybal on lvs2008
* 20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 14:05 vgutierrez: restart pybal on lvs2007
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 13:59 vgutierrez: restart pybal on lvs1014
* 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1410.eqiad.wmnet with reason: REIMAGE
* 13:55 vgutierrez: restart pybal on lvs1015
* 20:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2324.codfw.wmnet
* 13:52 _joe_: restarting pybal on lvs1016
* 19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 19:54 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 19:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1412.eqiad.wmnet
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 19:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1411.eqiad.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 19:52 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2324.codfw.wmnet
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2326.codfw.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 19:48 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw14124.eqiad.wmnet
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 19:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1411.eqiad.wmnet
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 19:44 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 19:44 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 19:41 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 19:37 tgr_: Morning deploys done
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:36 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 19:29 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 19:25 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658356{{!}}Enables MediaWiki client error instrument on English Wikipedia (T255585)]] (duration: 01m 01s)
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 19:20 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:657292{{!}}[beta] GrowthExperiments: set link recommendation feature flags ()]] (duration: 01m 06s)
* 19:00 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2324.codfw.wmnet with reason: REIMAGE
* 18:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 18:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2326.codfw.wmnet with reason: REIMAGE
* 18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1412.eqiad.wmnet with reason: REIMAGE
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1411.eqiad.wmnet with reason: REIMAGE
* 16:40 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 16:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:42 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Downtiming for rebuild
* 15:23 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: revert: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 01m 05s)
* 15:20 dcausse@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/CirrusSearch/: Add an option to limit the size of the file_text field: [[phab:T271493|T271493]] (duration: 00m 58s)
* 15:16 dcausse: re-opening EU Backport window to ship pending patches
* 15:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 14:37 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove 2 Remove migrated EventLoggingSchemas overrides - [[phab:T259163|T259163]], [[phab:T267352|T267352]] (duration: 00m 56s)
* 14:35 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 14:34 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:31 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:28 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:25 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
* 12:47 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|6a4cbe662655edaa4f6c36e69877766a6a48d828}}: Revert "Switch fiwiki to their 500k temporary logo!": delete temporary logo files (duration: 00m 57s)
* 12:41 urbanecm@deploy1001: Synchronized wmf-config/MetaContactPages.php: {{Gerrit|7a6a60fcaa635a8f891a6d09f3611f8620490497}}: Create Contact page for Ombuds commission at Meta-Wiki ([[phab:T271828|T271828]]) (duration: 01m 00s)
* 12:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=arbcom_ruwiki --fix # [[phab:T272292|T272292]]
* 12:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|833833385f1cf02a4578edb9b5108d173bdf30bd}}: Adding namespace aliases on arbcom-ruwiki ([[phab:T272292|T272292]]) (duration: 00m 57s)
* 12:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateCollation.php --wiki=trwikivoyage --previous-collation=uppercase # [[phab:T272783|T272783]]
* 12:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bcc7ad7acf721a5e0521bbecfe6df8671ac1822c}}: Set $wgCategoryCollation = uca-tr on trwikivoyage ([[phab:T272783|T272783]]) (duration: 00m 57s)
* 12:27 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|d34cb3205a58d5ac50800f2f218af6213f74f5e7}}: Resize the logo of Turkish Wikivoyage ([[phab:T272784|T272784]]) (duration: 00m 54s)
* 12:23 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|177339d96616b5941dbeb2c90ca6aa0be90e3b5a}}: Defining wgSitename for trwikivoyage ([[phab:T272779|T272779]]) (duration: 01m 00s)
* 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|89d072378e16b0410d963deca2fd766c1406b5b6}}: Enable SandboxLink on Turkish Wikivoyage ([[phab:T272780|T272780]]) (duration: 01m 05s)
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|75aa32fd5aee1feebe8a97360068da55cbcf06d8}}: frwiki: Change back to normal logo ([[phab:T272700|T272700]]) (duration: 01m 07s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|693eaec20a24620c2a709c8bac707c0d7af3436b}}: Add bidgee.id.au to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T272202|T272202]]) (duration: 01m 01s)
* 11:40 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:658242{{!}} Bumping portals to master (T128546)]] (duration: 00m 55s)
* 11:39 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:658242{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:35 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 11:33 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 11:11 godog: thanos delete old orphaned blocks with replica=unset label
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
* 10:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
* 10:44 godog: swift decrease weight for ms-be20[16,18,20,22] - [[phab:T272837|T272837]]
* 10:00 moritzm: installing imagemagick security updates on stretch
* 09:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1002.wikimedia.org
* 09:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1002.wikimedia.org
* 09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2002.wikimedia.org
* 09:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2002.wikimedia.org
* 09:40 godog: bounce apache2 on logstash1024, stuck on high cpu
* 09:21 marostegui@deploy1001: Synchronized wmf-config/etcd.php: Add x2 to the mapping array [[phab:T269324|T269324]] (duration: 00m 58s)
* 09:17 moritzm: installing samba security updates on stretch
* 09:15 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Add x2 to the mapping array [[phab:T269324|T269324]] (duration: 01m 01s)
* 09:06 ema: cp3054: install varnish 6.0.1-1wm2 -- 6.0.1 without https://github.com/varnishcache/varnish-cache/pull/2705 [[phab:T264398|T264398]]
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13944 and previous config saved to /var/cache/conftool/dbconfig/20210125-084715-root.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13943 and previous config saved to /var/cache/conftool/dbconfig/20210125-083211-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13942 and previous config saved to /var/cache/conftool/dbconfig/20210125-081708-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: After upgrading its kernel', diff saved to https://phabricator.wikimedia.org/P13941 and previous config saved to /var/cache/conftool/dbconfig/20210125-080204-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13940 and previous config saved to /var/cache/conftool/dbconfig/20210125-073322-marostegui.json
* 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Add x2 eqiad to dbctl [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13939 and previous config saved to /var/cache/conftool/dbconfig/20210125-064419-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Populate x2 eqiad hosts into dbctl [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13938 and previous config saved to /var/cache/conftool/dbconfig/20210125-064305-marostegui.json


== 2021-01-23 ==
== 2021-07-28 ==
* 22:21 volker-e@deploy1001: Finished deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427) (duration: 00m 06s)
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 22:21 volker-e@deploy1001: Started deploy [design/style-guide@63e39e7]: Deploy design/style-guide: {{Gerrit|63e39e7}} “Components”: Amend button groups states SVG font stack (#427)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 04:05 ryankemper: Depooled `wdqs1013` (it has ~50 mins of lag to catch up on, and also the bad gateway above)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 04:03 ryankemper: Restarted `wdqs-blazegraph` on `wdqs1013`: `sudo systemctl restart wdqs-blazegraph`
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2332.codfw.wmnet
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 01:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2328.codfw.wmnet
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2332.codfw.wmnet
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 01:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2328.codfw.wmnet
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2330.codfw.wmnet
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 01:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2334.codfw.wmnet
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 01:48 foks: reset user email for Davey2010
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 01:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 01:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1268.eqiad.wmnet
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2330.codfw.wmnet
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 01:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2334.codfw.wmnet
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 01:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 01:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1413.eqiad.wmnet
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 00:46 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch enwiki to use enwiki20 "Option A" logo variant ([[phab:T272526|T272526]]) (duration: 00m 57s)
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 00:36 legoktm@deploy1001: Synchronized static/images/project-logos/: Add enwiki20 "Option A" fixed logos ([[phab:T272526|T272526]]) (duration: 00m 59s)
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2021-01-22 ==
== 2021-07-27 ==
* 22:41 reedy@deploy1001: Synchronized invalid.json: (no justification provided) (duration: 00m 58s)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 20:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1268.eqiad.wmnet with reason: REIMAGE
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 20:05 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1413.eqiad.wmnet with reason: REIMAGE
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2328.codfw.wmnet with reason: REIMAGE
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2330.codfw.wmnet with reason: REIMAGE
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2332.codfw.wmnet with reason: REIMAGE
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2334.codfw.wmnet with reason: REIMAGE
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2356.codfw.wmnet
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2354.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2352.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 19:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2350.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2352.codfw.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2350.codfw.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 19:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2354.codfw.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 19:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2356.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 19:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: REIMAGE
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 19:09 mutante: releases1002 systemctl reset-failed
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: REIMAGE
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: REIMAGE
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 19:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: REIMAGE
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2364.codfw.wmnet
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2362.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 18:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2360.codfw.wmnet
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 18:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2358.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2362.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 18:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2364.codfw.wmnet
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2360.codfw.wmnet
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 18:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2358.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 18:17 mutante: releases2002 - rebooting to confirm works now and also new disk gets auto-mounted
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 18:03 mutante: releases1002 - replaced ens5 with ens6 in /etc/network/interfaaces and rebooted again
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases1002.eqiad.wmnet with reason: fixing networking - added disk
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 17:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2360.codfw.wmnet with reason: new install on buster
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 17:57 mutante: releases1002 (releases.wm.org active backend) - rebooting - hopefully it does not run into [[phab:T272555|T272555]] but if it does now it's known how to fix
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 17:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 17:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 17:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 17:52 mutante: releases2001 - create new partition table with fdisk, make ext4 filesystem on /dev/vdb1
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2364.codfw.wmnet with reason: REIMAGE
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 17:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2362.codfw.wmnet with reason: REIMAGE
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2360.codfw.wmnet with reason: REIMAGE
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 17:49 ppchelko@deploy1001: Finished deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]] (duration: 65m 37s)
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 17:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2358.codfw.wmnet with reason: REIMAGE
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 17:29 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 00m 07s)
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 17:29 mforns@deploy1001: Started deploy [analytics/refinery@eea071d] (thin): Extra bug-fix train THIN [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 17:23 mforns@deploy1001: Finished deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253] (duration: 10m 03s)
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 17:13 mforns@deploy1001: Started deploy [analytics/refinery@eea071d]: Extra bug-fix train [analytics/refinery@eea071def90a8a856b1e04dda23b77a850134253]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 16:44 ppchelko@deploy1001: Started deploy [restbase/deploy@e54225d]: [[phab:T270411|T270411]] [[phab:T270415|T270415]] [[phab:T270281|T270281]] [[phab:T270277|T270277]]
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 16:40 cmjohnson1: replacing optics/fiber  pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 [[phab:T271295|T271295]]
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 16:19 jynus: restart of backup source hosts on codfw [[phab:T271913|T271913]]
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 15:54 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 15:40 moritzm: installing puppetboard1002
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 15:24 moritzm: installing puppetboard2002
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 13:44 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13932 and previous config saved to /var/cache/conftool/dbconfig/20210122-134444-kormat.json
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P13931 and previous config saved to /var/cache/conftool/dbconfig/20210122-133341-marostegui.json
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 13:31 marostegui: Stop replication on db1121
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P13930 and previous config saved to /var/cache/conftool/dbconfig/20210122-133044-marostegui.json
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:29 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13929 and previous config saved to /var/cache/conftool/dbconfig/20210122-132939-kormat.json
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard2002.codfw.wmnet
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:20 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13927 and previous config saved to /var/cache/conftool/dbconfig/20210122-132028-kormat.json
* 14:11 moritzm: installing aspell security updates
* 13:14 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13926 and previous config saved to /var/cache/conftool/dbconfig/20210122-131436-kormat.json
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13925 and previous config saved to /var/cache/conftool/dbconfig/20210122-130525-kormat.json
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:59 kormat@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13924 and previous config saved to /var/cache/conftool/dbconfig/20210122-125932-kormat.json
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 12:54 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard2002.codfw.wmnet
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host puppetboard1002.eqiad.wmnet
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 12:50 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13923 and previous config saved to /var/cache/conftool/dbconfig/20210122-125021-kormat.json
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:47 kormat@cumin1001: dbctl commit (dc=all): 'db1149 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13922 and previous config saved to /var/cache/conftool/dbconfig/20210122-124748-kormat.json
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1149.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:43 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1110 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13921 and previous config saved to /var/cache/conftool/dbconfig/20210122-124310-kormat.json
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:38 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host puppetboard1002.eqiad.wmnet
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:38 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1127 from api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13920 and previous config saved to /var/cache/conftool/dbconfig/20210122-123832-kormat.json
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 25%: Reboot [[phab:T272121|T272121]]', diff saved to https://phabricator.wikimedia.org/P13919 and previous config saved to /var/cache/conftool/dbconfig/20210122-123518-kormat.json
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:33 volker-e@deploy1001: Finished deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424) (duration: 00m 07s)
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 12:33 volker-e@deploy1001: Started deploy [design/style-guide@9a811b8]: Deploy design/style-guide: {{Gerrit|9a811b8}} Add Language selectors to component overview Sketch document (#424)
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1135,1137].eqiad.wmnet
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:08 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1135,1137].eqiad.wmnet
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 12:00 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13918 and previous config saved to /var/cache/conftool/dbconfig/20210122-120011-kormat.json
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 11:54 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 11:51 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13917 and previous config saved to /var/cache/conftool/dbconfig/20210122-115113-kormat.json
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 11:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on es1023.eqiad.wmnet with reason: Extended reboot for [[phab:T272121|T272121]]
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:46 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13916 and previous config saved to /var/cache/conftool/dbconfig/20210122-114642-kormat.json
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:45 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13915 and previous config saved to /var/cache/conftool/dbconfig/20210122-114507-kormat.json
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 11:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 11:36 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on es1023.eqiad.wmnet with reason: Reboot for [[phab:T272121|T272121]]
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 11:36 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13914 and previous config saved to /var/cache/conftool/dbconfig/20210122-113610-kormat.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 11:31 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13913 and previous config saved to /var/cache/conftool/dbconfig/20210122-113139-kormat.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 11:30 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13912 and previous config saved to /var/cache/conftool/dbconfig/20210122-113004-kormat.json
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 11:24 kormat@cumin1001: dbctl commit (dc=all): 'es1023 depooling: enable report_host [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13911 and previous config saved to /var/cache/conftool/dbconfig/20210122-112424-kormat.json
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 11:24 hnowlan: joining restbase2009-a to cluster
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 11:21 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13910 and previous config saved to /var/cache/conftool/dbconfig/20210122-112106-kormat.json
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 11:16 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13909 and previous config saved to /var/cache/conftool/dbconfig/20210122-111635-kormat.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 11:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 11:15 kormat@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13908 and previous config saved to /var/cache/conftool/dbconfig/20210122-111500-kormat.json
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 11:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 11:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1136 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13906 and previous config saved to /var/cache/conftool/dbconfig/20210122-110603-kormat.json
* 08:57 _joe_: repooling mw225[12] for apis
* 11:05 jbond42: deploy cairo updates to jessie
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1141 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13905 and previous config saved to /var/cache/conftool/dbconfig/20210122-110229-kormat.json
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 11:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 11:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1141.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 08:36 jynus: reenabled puppet on mwmaint1002
* 11:01 kormat@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13904 and previous config saved to /var/cache/conftool/dbconfig/20210122-110132-kormat.json
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'db1136 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13903 and previous config saved to /var/cache/conftool/dbconfig/20210122-105952-kormat.json
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 10:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 10:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1136.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 07:52 jynus: disabling puppet on mwmaint1002
* 10:59 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1127 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13902 and previous config saved to /var/cache/conftool/dbconfig/20210122-105921-kormat.json
* 07:14 moritzm: installing krb security updates on buster
* 10:56 kormat@cumin1001: dbctl commit (dc=all): 'db1134 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13901 and previous config saved to /var/cache/conftool/dbconfig/20210122-105636-kormat.json
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 10:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 10:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1134.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 10:53 kormat@cumin1001: dbctl commit (dc=all): 'Remove db1088 from api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13900 and previous config saved to /var/cache/conftool/dbconfig/20210122-105345-kormat.json
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 10:52 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13899 and previous config saved to /var/cache/conftool/dbconfig/20210122-105244-kormat.json
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 10:37 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13898 and previous config saved to /var/cache/conftool/dbconfig/20210122-103741-kormat.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json
* 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13897 and previous config saved to /var/cache/conftool/dbconfig/20210122-103609-kormat.json
* 10:22 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13895 and previous config saved to /var/cache/conftool/dbconfig/20210122-102237-kormat.json
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13894 and previous config saved to /var/cache/conftool/dbconfig/20210122-102105-kormat.json
* 10:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host archiva1002.wikimedia.org
* 10:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host archiva1002.wikimedia.org
* 10:07 kormat@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13893 and previous config saved to /var/cache/conftool/dbconfig/20210122-100734-kormat.json
* 10:06 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13892 and previous config saved to /var/cache/conftool/dbconfig/20210122-100602-kormat.json
* 10:03 kormat@cumin1001: dbctl commit (dc=all): 'db1130 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13891 and previous config saved to /var/cache/conftool/dbconfig/20210122-100307-kormat.json
* 10:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1130.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 10:02 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1110 to api group [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13890 and previous config saved to /var/cache/conftool/dbconfig/20210122-100233-kormat.json
* 09:52 moritzm: uploaded cairo 1.14.0-2.1+deb8u2+wmf1 to apt.wikimedia.org
* 09:50 kormat@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13889 and previous config saved to /var/cache/conftool/dbconfig/20210122-095058-kormat.json
* 09:44 kormat@cumin1001: dbctl commit (dc=all): 'db1093 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13888 and previous config saved to /var/cache/conftool/dbconfig/20210122-094453-kormat.json
* 09:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1093.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 09:43 kormat@cumin1001: dbctl commit (dc=all): 'Temporarily add db1088 to api group [[phab:T271106|T271106]]', diff saved to https://phabricator.wikimedia.org/P13887 and previous config saved to /var/cache/conftool/dbconfig/20210122-094337-kormat.json
* 08:49 moritzm: installing PIP security updates for stretch
* 08:44 moritzm: installing mutt updates for stretch
* 08:35 XioNoX: Remove BGP for Zayo transit in ulsfo, eqiad, eqord
* 08:33 elukey: update puppet compiler's facts
* 07:26 ryankemper: [WDQS Deploy] WDQS deploy complete; service is healthy
* 06:59 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:58 ryankemper: [WDQS Deploy] Initial deploy complete, `query.wikidata.org` handles queries fine, proceeding to post-deploy steps
* 06:57 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 10m 43s)
* 06:50 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` following canary WDQS deploy, proceeding to rest of fleet
* 06:46 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 06:46 ryankemper: [WDQS Deploy] All tests passing on canary `wdqs1003` before WDQS deploy, beginning deploy
* 06:45 ryankemper: [wdqs] re-pooled `wdqs1013` (all caught up on lag)
* 06:16 marostegui: Stop MySQL on db1117 db2133 db2078 [[phab:T272614|T272614]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2143 and db2144 as x2 codfw slaves [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13885 and previous config saved to /var/cache/conftool/dbconfig/20210122-060147-marostegui.json
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2142 into x2 as codfw master [[phab:T269324|T269324]]', diff saved to https://phabricator.wikimedia.org/P13884 and previous config saved to /var/cache/conftool/dbconfig/20210122-060007-marostegui.json
* 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight', diff saved to https://phabricator.wikimedia.org/P13883 and previous config saved to /var/cache/conftool/dbconfig/20210122-054330-marostegui.json
* 01:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2368.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2366.codfw.wmnet
* 01:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2374.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2368.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2366.codfw.wmnet
* 01:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2374.codfw.wmnet
* 01:19 Urbanecm: Evening B&C window finished
* 01:18 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/: {{Gerrit|7d8ab70d5b00142e8344e242dd085eb7bfa81145}}: Dont return the status of doBlockInternal when processing block actions (duration: 00m 59s)
* 01:16 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|376cba1b33dd68d40490a1498c59a4d430318ab1}}: Enroll idwiki in the DiscussionTools a/b test ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:14 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/DiscussionTools/: {{Gerrit|513a7861bbcf06a8ac5c29e1b9838640cbd7c628}}: A/B test output when a specific feature is being tested ([[phab:T268191|T268191]]) (duration: 00m 55s)
* 01:12 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/WikibaseMediaInfo/: {{Gerrit|4b0259b761681ca90b3f3039019553ddca40a5fe}}: Distinguish between null continue value and unknown one ([[phab:T272548|T272548]]) (duration: 00m 59s)
* 01:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2376.codfw.wmnet
* 01:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 01:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2376.codfw.wmnet
* 01:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 01:00 Urbanecm: Evening B&C still in process, waiting on Zuul
* 00:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2366.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2368.codfw.wmnet with reason: REIMAGE
* 00:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:49 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1164.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 00:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: REIMAGE
* 00:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2376.codfw.wmnet with reason: REIMAGE
* 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2372.codfw.wmnet
* 00:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2370.codfw.wmnet
* 00:31 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 00:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4f5d6f09977962be1c49471432125a92357ede6}}: Temporarily amend ukwiki AF configuration ([[phab:T272330|T272330]]) (duration: 01m 03s)
* 00:20 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/MobileFrontend: Backport: [[gerrit:657702{{!}}Fix toggling storage cleanup (T272638)]] (duration: 01m 07s)
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2372.codfw.wmnet
* 00:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2370.codfw.wmnet
* 00:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster
* 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2370.codfw.wmnet with reason: new install on buster


== 2021-01-21 ==
== 2021-07-26 ==
* 23:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2370.codfw.wmnet with reason: REIMAGE
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2372.codfw.wmnet with reason: REIMAGE
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2374.codfw.wmnet with reason: REIMAGE
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 22:10 brennen: 1.36.0-wmf.27 train status: for avoidance of doubt, no deploys until further notice - sorting out [[phab:T272638|T272638]]
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 21:27 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.26
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 20:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 20:04 razzi@cumin1001: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 19:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac99da75f9507e19472ab3020be638262857ec07}}: Migrate WebUIActionsTracking schemas to Event Platform on testwiki ([[phab:T267347|T267347]]; [[phab:T271164|T271164]]) (duration: 01m 03s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 19:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bb9e5d13be702516368774732a9e1711bec42e5}}: Enables the Wikisource extension on oldwikisource ([[phab:T272163|T272163]]) (duration: 01m 04s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 19:40 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/EventLogging/: {{Gerrit|ee830a5ec2051fa970084e89b477a44c384e309c}}: {{Gerrit|f7152a74e00404fc561c44d1c2e37d7f882e2f52}}: EventLogging backport, see commits for details ([[phab:T253121|T253121]]) (duration: 01m 05s)
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 19:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2226.codfw.wmnet
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2375.codfw.wmnet
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2373.codfw.wmnet
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2371.codfw.wmnet
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 19:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2226.codfw.wmnet
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 19:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|62c9c35a76e2d065922f8c9f5a58672240dea7de}}: Migrate SuggestedTagsAction to Event Platform on all wikis ([[phab:T267351|T267351]]) (duration: 01m 03s)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 19:21 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|0b46c9f1f75fc773f57bfa70521c9eaf20410b9e}}: [no-op] Add notes about load order of Wikisource and Collection extensions ([[phab:T255790|T255790]]) (duration: 01m 11s)
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 19:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2375.codfw.wmnet
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2373.codfw.wmnet
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 19:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2371.codfw.wmnet
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:02 cstone: civicrm revision changed from {{Gerrit|a4caad22b1}} to {{Gerrit|3afb54f6f9}}
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:53 razzi@cumin1001: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes - razzi@cumin1001
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 18:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 18:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2226.codfw.wmnet with reason: REIMAGE
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 18:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 18:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2375.codfw.wmnet with reason: REIMAGE
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 18:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2373.codfw.wmnet with reason: REIMAGE
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 18:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2371.codfw.wmnet with reason: REIMAGE
* 06:39 moritzm: installing krb5 security updates
* 18:21 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 18:14 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:12 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:08 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:42 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:36 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:35 ryankemper: [wdqs] Depooled `wdqs1013` to allow it to catch up on lag
* 16:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 16:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
* 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
* 15:13 moritzm: installing cairo security updates on stretch
* 15:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5001.wikimedia.org
* 14:17 godog: roll-restart swift-object in eqiad to apply new concurrency
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5001.wikimedia.org
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4002.wikimedia.org
* 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4002.wikimedia.org
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3004.wikimedia.org
* 13:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3004.wikimedia.org
* 13:38 XioNoX: put eqiad/esams lumen link back in service
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13872 and previous config saved to /var/cache/conftool/dbconfig/20210121-122043-root.json
* 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13871 and previous config saved to /var/cache/conftool/dbconfig/20210121-120540-root.json
* 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13870 and previous config saved to /var/cache/conftool/dbconfig/20210121-115036-root.json
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13868 and previous config saved to /var/cache/conftool/dbconfig/20210121-113533-root.json
* 11:29 marostegui: Stop replication on db1085 to move wiki replicas under the other sanitarium host
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P13867 and previous config saved to /var/cache/conftool/dbconfig/20210121-112849-marostegui.json
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 09:44 hoo: Updated the Wikidata property suggester with data from the 2021-01-11 JSON dump and applied the [[phab:T132839|T132839]] workarounds
* 09:00 marostegui: m1 master restart - [[phab:T271540|T271540]]
* 08:51 jynus: stopping puppet and bacula for backup1001 [[phab:T271540|T271540]]
* 08:43 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:37 marostegui: Silence m1 hosts in preparation for the restart [[phab:T271540|T271540]]
* 08:34 godog: roll-restart swift-object in codfw to apply new concurrency
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13864 and previous config saved to /var/cache/conftool/dbconfig/20210121-072101-marostegui.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13863 and previous config saved to /var/cache/conftool/dbconfig/20210121-070346-marostegui.json
* 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repoool db1099:3318', diff saved to https://phabricator.wikimedia.org/P13862 and previous config saved to /var/cache/conftool/dbconfig/20210121-065459-marostegui.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P13861 and previous config saved to /var/cache/conftool/dbconfig/20210121-065408-marostegui.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and pool db1099:3318 into s8 vslow', diff saved to https://phabricator.wikimedia.org/P13860 and previous config saved to /var/cache/conftool/dbconfig/20210121-064903-marostegui.json
* 03:54 milimetric@deploy1001: deploy aborted: Minor typo fix (duration: 01m 39s)
* 03:52 milimetric@deploy1001: Started deploy [analytics/refinery@57589e7]: Minor typo fix
* 01:27 ryankemper: [WDQS Deploy] Rollback complete, service health of `wdqs1003` is restored. Need to investigate source of 404 (possibly related to some recent changes we made in the `gui` repo)
* 01:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@70f9d37]: 0.3.60 (duration: 02m 53s)
* 01:26 ryankemper: [WDQS Deploy] Rollback of canary `wdqs1003` initiated
* 01:25 ryankemper: [WDQS Deploy] Automated tests passing on canary`wdqs1003` but manually visiting `http://localhost:9999` (my tunnel to `wdqs1003`) gives `404 Not Found`from nginx; aborting deploy
* 01:23 ryankemper@deploy1001: Started deploy [wdqs/wdqs@70f9d37]: 0.3.60
* 01:22 ryankemper: [WDQS Deploy] Tests on canary `wdqs1003` passing before start of deploy, proceeding with deploy of wdqs `0.3.60` to canary
* 00:44 legoktm: legoktm@mwmaint1002:~$ mwscript initSiteStats.php --wiki=trwikivoyage --update
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2369.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2367.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2365.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2363.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2369.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2365.codfw.wmnet
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2367.codfw.wmnet
* 00:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2363.codfw.wmnet


== 2021-01-20 ==
== 2021-07-24 ==
* 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2369.codfw.wmnet with reason: REIMAGE
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2367.codfw.wmnet with reason: REIMAGE
* 23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2365.codfw.wmnet with reason: REIMAGE
* 23:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2363.codfw.wmnet with reason: REIMAGE
* 23:30 mutante: releases2002 - rebooting VM
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2361.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2359.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
* 23:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 23:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on releases2002.codfw.wmnet with reason: rebooting to add a disk
* 23:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2357.codfw.wmnet
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2361.codfw.wmnet
* 23:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2359.codfw.wmnet
* 23:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2355.codfw.wmnet
* 23:03 legoktm: updated docker-registry.discovery.wmnet/wikimedia-buster image
* 23:01 mutante: mw2331, mw2333 - scap pull
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2359.codfw.wmnet with reason: new install on buster
* 22:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 22:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 22:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2361.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2359.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 22:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 22:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 22:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 22:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 22:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2339.codfw.wmnet
* 22:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2338.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2353.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2351.codfw.wmnet
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 22:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
* 21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244] (duration: 00m 07s)
* 21:35 milimetric@deploy1001: Started deploy [analytics/refinery@1313244] (thin): Regular analytics weekly train THIN [analytics/refinery@1313244]
* 21:34 milimetric@deploy1001: Finished deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244] (duration: 10m 52s)
* 21:24 milimetric@deploy1001: Started deploy [analytics/refinery@1313244]: Regular analytics weekly train [analytics/refinery@1313244]
* 21:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 21:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 21:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 21:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2339.codfw.wmnet with reason: REIMAGE
* 21:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2338.codfw.wmnet with reason: REIMAGE
* 21:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 21:02 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 20:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 20:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2337.codfw.wmnet
* 20:46 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 20:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2333.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2331.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
* 20:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2333.codfw.wmnet
* 20:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2331.codfw.wmnet
* 20:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 20:41 effie: restart mc-gp2001, mc-gp2002, mc-gp2003 for [[phab:T269596|T269596]]
* 20:31 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.27 (duration: 03m 05s)
* 20:28 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.27
* 20:23 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train: proceeding to group1
* 20:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒🍵 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I558346d}} [[phab:T272330|T272330]]"'
* 20:16 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:06 brennen: 1.36.0-wmf.27 ([[phab:T271341|T271341]]) train status as of deploy window: currently blocked at group0 on [[phab:T272508|T272508]]
* 20:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:50 bblack: lvs1015: bringing pybal back online
* 19:47 bblack: lvs1015: stopping pybal to try to fix a lingering ifup service state issue on the host, which may require downing an interface
* 19:33 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|5c941678ec739dd6b5257b4a8f866b7e3a257f45}}: Revert: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 04s)
* 19:24 effie: depool and repool thumbor* to upgrade python-thumbor-wikimedia to v2.9
* 19:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 19:22 urbanecm@deploy1001: Synchronized static/images/project-logos: {{Gerrit|13fb338249b3ec73e380c4971ee697f28a2f6d76}}: [enwiki] Update celebration logo to "option A" ([[phab:T272526|T272526]]) (duration: 01m 05s)
* 19:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2337.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2335.codfw.wmnet with reason: REIMAGE
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2333.codfw.wmnet with reason: REIMAGE
* 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2331.codfw.wmnet with reason: REIMAGE
* 19:12 urbanecm@deploy1001: Synchronized wmf-config/config/kuwiki.yaml: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 2/2) (duration: 01m 05s)
* 19:11 XioNoX: add BGP to Lumen in eqiad
* 19:11 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|a736d97463e7a42b41dbcff19a8c2c3c62f8bf6d}}: Enable visualeditor on kuwiki by default ([[phab:T270841|T270841]]; 1/2) (duration: 01m 04s)
* 18:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2325.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2327.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2329.codfw.wmnet
* 18:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2316.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2329.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2327.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2325.codfw.wmnet
* 18:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2316.codfw.wmnet
* 18:42 brennen@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/AbuseFilter/includes/View/AbuseFilterViewDiff.php: Backport: [[gerrit:657366{{!}}Catch ClosestFilterVersionNotFoundException in ViewDiff (T272505)]] (duration: 01m 06s)
* 18:29 bblack: lvs1015: re-enabling puppet + pybal - [[phab:T272258|T272258]]
* 18:25 XioNoX: draining esams-eqiad link
* 18:24 mutante: ganeti - creating 150G virtual hard disk and adding it to releases2002 for [[phab:T272092|T272092]]
* 18:22 mutante: ganeti - creating 105G virtual harddisk and adding to releases1002 for [[phab:T272092|T272092]]
* 18:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2327.codfw.wmnet with reason: new install on buster
* 18:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2329.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2327.codfw.wmnet with reason: REIMAGE
* 18:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2325.codfw.wmnet with reason: REIMAGE
* 18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2316.codfw.wmnet with reason: REIMAGE
* 18:01 bblack: lvs1015 - shutdown for [[phab:T272258|T272258]]
* 17:58 volans@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:54 bblack: lvs1015: stopping pybal with puppet disabled for [[phab:T272258|T272258]]
* 17:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 17:24 volans@cumin2001: START - Cookbook sre.dns.netbox
* 16:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
* 16:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
* 16:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
* 16:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
* 15:59 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
* 15:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 15:55 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 15:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 15:47 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13858 and previous config saved to /var/cache/conftool/dbconfig/20210120-154726-kormat.json
* 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:46 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13857 and previous config saved to /var/cache/conftool/dbconfig/20210120-153223-kormat.json
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
* 15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
* 15:18 brennen: 1.36.0-wmf.27 train unblocked, proceeding to group0 ([[phab:T271341|T271341]])
* 15:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
* 15:17 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13856 and previous config saved to /var/cache/conftool/dbconfig/20210120-151719-kormat.json
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
* 15:15 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13855 and previous config saved to /var/cache/conftool/dbconfig/20210120-151555-kormat.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
* 15:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
* 15:02 kormat@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13854 and previous config saved to /var/cache/conftool/dbconfig/20210120-150216-kormat.json
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
* 15:00 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 66%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13853 and previous config saved to /var/cache/conftool/dbconfig/20210120-150051-kormat.json
* 14:59 elukey@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on all wikis - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 05s)
* 14:56 kormat@cumin1001: dbctl commit (dc=all): 'db1109 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13852 and previous config saved to /var/cache/conftool/dbconfig/20210120-145605-kormat.json
* 14:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1109.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
* 14:47 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate QuickSurveys schemas to EventGate on testwiki - [[phab:T271165|T271165]], [[phab:T271166|T271166]] (duration: 01m 06s)
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
* 14:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
* 14:45 kormat@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 33%: Reboot [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13851 and previous config saved to /var/cache/conftool/dbconfig/20210120-144547-kormat.json
* 14:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
* 14:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
* 14:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
* 14:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 14:32 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
* 14:26 kormat@cumin1001: dbctl commit (dc=all): 'db1076 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13850 and previous config saved to /var/cache/conftool/dbconfig/20210120-142636-kormat.json
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1076.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
* 14:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after reboot. [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13849 and previous config saved to /var/cache/conftool/dbconfig/20210120-142139-kormat.json
* 14:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
* 14:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
* 14:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
* 14:12 kormat@cumin1001: dbctl commit (dc=all): 'db1075 depooling: Rebooting for [[phab:T272255|T272255]]', diff saved to https://phabricator.wikimedia.org/P13848 and previous config saved to /var/cache/conftool/dbconfig/20210120-141230-kormat.json
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1075.eqiad.wmnet with reason: Rebooting for [[phab:T272255|T272255]]
* 14:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
* 14:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
* 14:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
* 13:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
* 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
* 13:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
* 13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2039.codfw.wmnet
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/Translate/: {{Gerrit|20decbd5cc3de0af655b9419cf69fc442ab056a4}}: Add flag to toggle the usage of the group synchronization cache ([[phab:T272428|T272428]]; [[phab:T182433|T182433]]) (duration: 01m 10s)
* 13:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2039.codfw.wmnet
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:14 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 13:02 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
* 12:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2087.codfw.wmnet with reason: Schema change [[phab:T267767|T267767]]
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2037.codfw.wmnet
* 12:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2036.codfw.wmnet
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2036.codfw.wmnet
* 12:31 godog: bounce icinga on alert1001
* 12:28 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:27 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2035.codfw.wmnet
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2035.codfw.wmnet
* 12:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2034.codfw.wmnet
* 12:10 matthiasmullie: EU config window done
* 12:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2034.codfw.wmnet
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2033.codfw.wmnet
* 12:08 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2fc57b259}}: Remove MediaSearch survey (duration: 01m 10s)
* 12:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2033.codfw.wmnet
* 12:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2032.codfw.wmnet
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2032.codfw.wmnet
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13847 and previous config saved to /var/cache/conftool/dbconfig/20210120-112808-root.json
* 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13846 and previous config saved to /var/cache/conftool/dbconfig/20210120-111305-root.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13845 and previous config saved to /var/cache/conftool/dbconfig/20210120-105801-root.json
* 10:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2030.codfw.wmnet
* 10:51 XioNoX: Discard the non-whitelisted 172.16.0.0/12 traffic - [[phab:T209082|T209082]]
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2030.codfw.wmnet
* 10:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2029.codfw.wmnet
* 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13844 and previous config saved to /var/cache/conftool/dbconfig/20210120-104257-root.json
* 10:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2029.codfw.wmnet
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2028.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13842 and previous config saved to /var/cache/conftool/dbconfig/20210120-103449-marostegui.json
* 10:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2028.codfw.wmnet
* 10:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2027.codfw.wmnet
* 10:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2027.codfw.wmnet
* 10:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2026.codfw.wmnet
* 10:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2026.codfw.wmnet
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2025.codfw.wmnet
* 09:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2025.codfw.wmnet
* 09:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2024.codfw.wmnet
* 09:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2024.codfw.wmnet
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2023.codfw.wmnet
* 09:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2021.codfw.wmnet
* 09:32 moritzm: installing cuminunpriv1001
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2021.codfw.wmnet
* 09:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2020.codfw.wmnet
* 09:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2020.codfw.wmnet
* 09:19 XioNoX: configure Lumen interfaces
* 09:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2019.codfw.wmnet
* 09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2019.codfw.wmnet
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2018.codfw.wmnet
* 09:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2018.codfw.wmnet
* 00:43 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:656284{{!}}Update /analytics/legacy/homepagemodule/ schema version to 1.1.0 (T270309)]] (duration: 01m 03s)
* 00:30 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:655863{{!}}(no-op) GrowthExperiments: Disable link recommendations (T261408)]] (duration: 01m 05s)
* 00:09 legoktm: uploaded docker-report 0.0.4-1~deb9u1 to stretch-wikimedia ([[phab:T179696|T179696]])


== 2021-01-19 ==
== 2021-07-23 ==
* 21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2314.codfw.wmnet
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 21:51 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group0 wikis to 1.36.0-wmf.26
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2313.codfw.wmnet
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2312.codfw.wmnet
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2315.codfw.wmnet
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 21:46 ottomata: wiping kafka-test cluster data and starting from scratch - [[phab:T255973|T255973]]
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:00 Urbanecm: Start of `foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000` ([[phab:T269713|T269713]])
* 16:15 effie: enable puppet on mc-gp* hosts
* 20:09 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.27
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2315.codfw.wmnet
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2314.codfw.wmnet
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2313.codfw.wmnet
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 20:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2312.codfw.wmnet
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 19:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 19:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 19:27 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 19:22 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:58 brennen@deploy1001: Pruned MediaWiki: 1.36.0-wmf.22 (duration: 03m 53s)
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:47 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:43 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:42 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.27 (duration: 41m 57s)
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 18:39 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 18:01 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.27
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 17:59 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 17:59 brennen: starting deploy-promote to testwikis for 1.36.0-wmf.27 ([[phab:T271341|T271341]])
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 17:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2009.codfw.wmnet with reason: REIMAGE
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 17:30 Urbanecm: Start of `foreachwikiindblist group1 extensions/AbuseFilter/maintenance/MigrateAflFilter.php --batch-size=1000  ` ([[phab:T269713|T269713]])
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 17:08 Urbanecm: Run extensions/AbuseFilter/maintenance/MigrateAflFilter.php for all group0 wikis ([[phab:T269713|T269713]])
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 17:06 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=test2wiki --batch-size=1000 # [[phab:T269713|T269713]]
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 17:04 Urbanecm: mwscript extensions/AbuseFilter/maintenance/MigrateAflFilter.php --wiki=testwiki --batch-size=1000 # [[phab:T269713|T269713]]
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 16:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2314.codfw.wmnet with reason: new install on buster
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 16:50 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 16:48 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 16:47 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 16:46 brennen: 1.36.0-wmf.27 was branched at {{Gerrit|fbb516d8e33924c6cb66c93bb6d42907558c31f3}} for [[phab:T271341|T271341]]
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2315.codfw.wmnet with reason: REIMAGE
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 16:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2314.codfw.wmnet with reason: REIMAGE
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2313.codfw.wmnet with reason: REIMAGE
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2312.codfw.wmnet with reason: REIMAGE
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:41 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be1046.eqiad.wmnet
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:39 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13838 and previous config saved to /var/cache/conftool/dbconfig/20210119-163637-root.json
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 16:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13837 and previous config saved to /var/cache/conftool/dbconfig/20210119-162134-root.json
* 16:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 16:07 moritzm: powercycling ms-be1046, stuck during boot
* 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13836 and previous config saved to /var/cache/conftool/dbconfig/20210119-160630-root.json
* 15:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 15:58 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop test cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13835 and previous config saved to /var/cache/conftool/dbconfig/20210119-155127-root.json
* 15:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
* 15:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:43 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cuminunpriv1001.eqiad.wmnet
* 15:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
* 15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
* 15:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
* 15:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host cuminunpriv1001.eqiad.wmnet
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:23 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
* 15:15 Urbanecm: Run `foreachwikiindblist closed extensions/AbuseFilter/maintenance/MigrateAflFilter.php` ([[phab:T269713|T269713]])
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 15:06 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 15:03 Jeff_Green: authdns-update DNS adjustments for frdata-(eqiad{{!}}codfw)
* 14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1042.eqiad.wmnet
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
* 14:19 marostegui: Sanitize trwikivoyage on db2094:3315, db1124:3315, db1154:3315 [[phab:T271261|T271261]]
* 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
* 14:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
* 14:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
* 14:08 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T271264|T271264]])
* 14:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
* 13:49 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T271264|T271264]])
* 13:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1039.eqiad.wmnet
* 13:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1039.eqiad.wmnet
* 13:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1038.eqiad.wmnet
* 13:39 Urbanecm: trwikivoyage is created
* 13:39 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 53s)
* 13:38 godog: bounce logstash on logstash1025 to debug unindexable logs
* 13:37 urbanecm@deploy1001: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 05s)
* 13:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:35 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:34 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating trwikivoyage ([[phab:T271260|T271260]])
* 13:32 urbanecm@deploy1001: Synchronized dblists: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:31 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 55s)
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1038.eqiad.wmnet
* 13:30 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating trwikivoyage ([[phab:T271260|T271260]]) (duration: 00m 56s)
* 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1037.eqiad.wmnet
* 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1037.eqiad.wmnet
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1036.eqiad.wmnet
* 12:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1036.eqiad.wmnet
* 12:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1035.eqiad.wmnet
* 12:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1035.eqiad.wmnet
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1034.eqiad.wmnet
* 12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'staging' .
* 12:45 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'production' .
* 12:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:656842{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:44 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:44 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:656842{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1034.eqiad.wmnet
* 12:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1033.eqiad.wmnet
* 12:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|338c0f9fe32512266c3030f7c9b7f8804ed30432}}: wgAbuseFilterAflFilterMigrationStage: Make WRITE_BOTH everywhere ([[phab:T269712|T269712]]) (duration: 00m 56s)
* 12:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1033.eqiad.wmnet
* 12:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1031.eqiad.wmnet
* 12:25 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1031.eqiad.wmnet
* 12:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1030.eqiad.wmnet
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:22 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1030.eqiad.wmnet
* 12:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1029.eqiad.wmnet
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 12:12 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1029.eqiad.wmnet
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a4cbe662655edaa4f6c36e69877766a6a48d828}}: Revert "Switch fiwiki to their 500k temporary logo!" ([[phab:T270974|T270974]]) (duration: 00m 56s)
* 11:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1028.eqiad.wmnet
* 11:54 moritzm: installing remaining openssl 1.1 updates on stretch
* 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1028.eqiad.wmnet
* 11:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1026.eqiad.wmnet
* 11:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1026.eqiad.wmnet
* 11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:34 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 11:33 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1025.eqiad.wmnet
* 11:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1025.eqiad.wmnet
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1024.eqiad.wmnet
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'test' .
* 11:11 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'canary' .
* 11:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1024.eqiad.wmnet
* 11:10 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1023.eqiad.wmnet
* 11:06 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1023.eqiad.wmnet
* 10:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 10:56 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 10:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2017.codfw.wmnet
* 10:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
* 09:51 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2017.codfw.wmnet
* 09:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2017.codfw.wmnet
* 09:49 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2016.codfw.wmnet
* 09:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-be2016.codfw.wmnet
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13828 and previous config saved to /var/cache/conftool/dbconfig/20210119-090100-marostegui.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078, depooled by mistake', diff saved to https://phabricator.wikimedia.org/P13827 and previous config saved to /var/cache/conftool/dbconfig/20210119-085918-marostegui.json
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13826 and previous config saved to /var/cache/conftool/dbconfig/20210119-085856-marostegui.json
* 08:54 godog: swift codfw-prod: more weight to ms-be20[58-61] - [[phab:T269337|T269337]]
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13825 and previous config saved to /var/cache/conftool/dbconfig/20210119-080839-root.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13824 and previous config saved to /var/cache/conftool/dbconfig/20210119-075336-root.json
* 07:41 oblivian@deploy1001: Synchronized README: Null deployments to test php restarts from scap (duration: 01m 23s)
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13823 and previous config saved to /var/cache/conftool/dbconfig/20210119-073832-root.json
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13822 and previous config saved to /var/cache/conftool/dbconfig/20210119-072329-root.json
* 07:14 elukey: clean up prometheus es exporter units on es-codfw nodes not needed anymore
* 07:02 marostegui: Stop MySQL on db1082 [[phab:T272008|T272008]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13821 and previous config saved to /var/cache/conftool/dbconfig/20210119-065748-marostegui.json
* 06:04 marostegui: Upgrade kernel on pc2007 pc2008 pc2009 pc2010 [[phab:T272121|T272121]]
* 04:39 Krinkle: unlocked per ttps://phabricator.wikimedia.org/T272215#6755025
* 04:37 Krinkle: locks scap on deploy1001 as precaution


== 2021-01-18 ==
== 2021-07-22 ==
* 21:33 eileen: civicrm revision changed from {{Gerrit|4220fc8177}} to {{Gerrit|a4caad22b1}}, config revision is {{Gerrit|f08249ecf9}}
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2311.codfw.wmnet
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2310.codfw.wmnet
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2309.codfw.wmnet
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 21:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2307.codfw.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2309.codfw.wmnet
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2307.codfw.wmnet
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2310.codfw.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2311.codfw.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 20:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 20:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2311.codfw.wmnet with reason: REIMAGE
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2310.codfw.wmnet with reason: REIMAGE
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2309.codfw.wmnet with reason: REIMAGE
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 20:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2307.codfw.wmnet with reason: REIMAGE
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2305.codfw.wmnet
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2303.codfw.wmnet
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2277.codfw.wmnet
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2276.codfw.wmnet
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 20:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2303.codfw.wmnet
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2305.codfw.wmnet
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2277.codfw.wmnet
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2276.codfw.wmnet
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 19:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2305.codfw.wmnet with reason: REIMAGE
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:38 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2303.codfw.wmnet with reason: REIMAGE
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 19:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 19:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 19:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2277.codfw.wmnet with reason: REIMAGE
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 19:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2276.codfw.wmnet with reason: REIMAGE
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2275.codfw.wmnet
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 18:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2274.codfw.wmnet
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 18:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2273.codfw.wmnet
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2271.codfw.wmnet
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 18:36 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1136,1138].eqiad.wmnet
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2274.codfw.wmnet
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2275.codfw.wmnet
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 18:34 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1136,1138].eqiad.wmnet
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2273.codfw.wmnet
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 18:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2271.codfw.wmnet
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 18:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 18:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: REIMAGE
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 18:20 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 18:19 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1130.eqiad.wmnet
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 18:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 18:17 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1130.eqiad.wmnet
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 18:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: REIMAGE
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 18:14 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1128.eqiad.wmnet
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 18:12 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1128.eqiad.wmnet
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:51 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1124-1127].eqiad.wmnet
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 17:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 17:49 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1124-1127].eqiad.wmnet
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 17:49 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 17:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 17:48 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1121-1123].eqiad.wmnet
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 17:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 17:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2275.codfw.wmnet with reason: REIMAGE
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 17:46 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1121-1123].eqiad.wmnet
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 17:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2274.codfw.wmnet with reason: REIMAGE
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2273.codfw.wmnet with reason: REIMAGE
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 17:45 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2271.codfw.wmnet with reason: REIMAGE
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 17:44 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1120.eqiad.wmnet
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 17:42 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1120.eqiad.wmnet
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 17:38 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1118.eqiad.wmnet
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 17:36 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1118.eqiad.wmnet
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 17:32 mutante: reimaging mw2271,mw2273,mw2274,mw227 (codfw only)
* 14:27 moritzm: installing libwebp security updates on stretch
* 16:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 16:22 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: REIMAGE
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 16:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: REIMAGE
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 15:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: REIMAGE
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 15:48 moritzm: installing wavpack security updates
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 15:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 15:36 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 15:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1128.eqiad.wmnet with reason: REIMAGE
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 14:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 11:36 Lucas_WMDE: EU backport+config window done
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 14:43 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 14:31 kormat@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 14:30 arturo: updating packages in buster-wikimedia/thirdparty/ceph-nautilus-buster ([[phab:T272296|T272296]])
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 14:26 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 14:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 14:18 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 13:34 moritzm: uploaded wmf-sre-laptop 0.3.2 to apt.wikimedia.org
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 13:26 volans: installed spicerack 0.0.48-1+deb10u1 on cumin hosts
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 13:12 marostegui: Upgrade db2071 to 10.4.17 - [[phab:T268457|T268457]]
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 13:08 XioNoX: add NAT rule on pfw3-eqiad - [[phab:T272066|T272066]]
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 12:56 XioNoX: add NAT rule on pfw3-codfw - [[phab:T272066|T272066]]
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2008.codfw.wmnet
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 12:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 12:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2008.codfw.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:31 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1127.eqiad.wmnet with reason: REIMAGE
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 12:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1126.eqiad.wmnet with reason: REIMAGE
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 12:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 12:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2007.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 12:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1125.eqiad.wmnet with reason: REIMAGE
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 12:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2007.codfw.wmnet
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 12:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2006.codfw.wmnet
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 12:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 12:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1124.eqiad.wmnet with reason: REIMAGE
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 12:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2006.codfw.wmnet
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 12:08 volans: uploaded spicerack_0.0.48 to apt.wikimedia.org buster-wikimedia
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 12:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2005.codfw.wmnet
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 12:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 12:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1123.eqiad.wmnet with reason: REIMAGE
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe2005.codfw.wmnet
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1008.eqiad.wmnet
* 11:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1122.eqiad.wmnet with reason: REIMAGE
* 11:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1008.eqiad.wmnet
* 11:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1007.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1007.eqiad.wmnet
* 11:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1121.eqiad.wmnet with reason: REIMAGE
* 11:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1006.eqiad.wmnet
* 11:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1006.eqiad.wmnet
* 11:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:18 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1120.eqiad.wmnet with reason: REIMAGE
* 11:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1005.eqiad.wmnet
* 11:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ms-fe1005.eqiad.wmnet
* 11:10 hashar: Restarting Gerrit main instance on gerrit1001.wikimedia.org
* 11:08 hashar: Restarting Gerrit replica on gerrit2001.wikimedia.org
* 10:58 moritzm: installing python2.7 security updates on Stretch
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13799 and previous config saved to /var/cache/conftool/dbconfig/20210118-102959-root.json
* 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13798 and previous config saved to /var/cache/conftool/dbconfig/20210118-101456-root.json
* 10:00 _joe_: restarting pybal on lvs1016, not talking to its etcd server
* 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13797 and previous config saved to /var/cache/conftool/dbconfig/20210118-095952-root.json
* 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13796 and previous config saved to /var/cache/conftool/dbconfig/20210118-094449-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to stop replication [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13795 and previous config saved to /var/cache/conftool/dbconfig/20210118-092546-marostegui.json
* 09:24 kormat@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13794 and previous config saved to /var/cache/conftool/dbconfig/20210118-092429-root.json
* 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105:3311 from vslow', diff saved to https://phabricator.wikimedia.org/P13793 and previous config saved to /var/cache/conftool/dbconfig/20210118-092003-marostegui.json
* 09:13 moritzm: installing openssl 1.1 security updates on stretch
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13791 and previous config saved to /var/cache/conftool/dbconfig/20210118-090926-root.json
* 09:06 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:01 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13790 and previous config saved to /var/cache/conftool/dbconfig/20210118-085422-root.json
* 08:46 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 08:42 kormat@cumin1001: START - Cookbook sre.hosts.decommission
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After moving wikireplicas to another host', diff saved to https://phabricator.wikimedia.org/P13788 and previous config saved to /var/cache/conftool/dbconfig/20210118-083919-root.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 to stop replication, place db1105:3311 temporarily in vslow [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P13787 and previous config saved to /var/cache/conftool/dbconfig/20210118-081740-marostegui.json
* 08:15 moritzm: installing remaining openssl 1.0 security updated on stretch
* 08:13 elukey: clean up old archiva debs and upload 2.2.4-3 to buster-wikimedia - [[phab:T272082|T272082]]
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13786 and previous config saved to /var/cache/conftool/dbconfig/20210118-080122-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13785 and previous config saved to /var/cache/conftool/dbconfig/20210118-074618-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 50%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13784 and previous config saved to /var/cache/conftool/dbconfig/20210118-073115-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: After restarting for kernel upgraed', diff saved to https://phabricator.wikimedia.org/P13783 and previous config saved to /var/cache/conftool/dbconfig/20210118-071611-root.json
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P13782 and previous config saved to /var/cache/conftool/dbconfig/20210118-065312-marostegui.json
* 06:35 marostegui: Reboot dbproxy2001, dbproxy2002, dbproxy2003 for kernel upgrade
* 06:22 marostegui: Reboot db1154 and db1155 for kernel upgrade


== 2021-01-16 ==
== 2021-07-21 ==
* 12:18 elukey: elukey@cumin1001:~$ sudo cumin 'A:mw-app-canary and A:mw-eqiad' 'run-puppet-agent' -b 10 - [[phab:T272215|T272215]]
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 12:10 elukey: 'elukey@cumin1001:~$ sudo cumin 'A:mw-eqiad' 'run-puppet-agent' -b 10' [[phab:T272215|T272215]])
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 11:23 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=appserver,dc=eqiad
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch