You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null (T249277) (duration: 01m 00s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(433 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-04-03 ==
== 2021-08-03 ==
* 00:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null ([[phab:T249277|T249277]]) (duration: 01m 00s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-04-02 ==
== 2021-08-02 ==
* 23:53 hoo: Started Wikibase rebuildItemsPerSite on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:09 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't try to grant 'oathauth-enable' to '*' (part 2) ([[phab:T248282|T248282]]) (duration: 00m 58s)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 19:53 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/Translate/specials/SpecialExportTranslations.php: [[phab:T249258|T249258]]: Revert 'Special:ExportTranslations: Disallow exporting huge groups' (duration: 00m 59s)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 19:38 ppchelko@deploy1001: Finished deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]] (duration: 15m 13s)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/MovePage.php: [[phab:T248789|T248789]] MovePage: Use correct Title when creating the null revision (duration: 00m 59s)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 19:30 hashar: docker-pkg update on contint hosts
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:30 hashar@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 12s)
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 hashar@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 ppchelko@deploy1001: Started deploy [restbase/deploy@7923c1f]: Update CSP headers for mobileapps [[phab:T248431|T248431]]
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 19:05 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 19:00 longma: promoting all to 1.35.0-wmf.26
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 18:39 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]] (duration: 01m 05s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 18:38 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26  refs [[phab:T247773|T247773]]
* 21:16 tzatziki: removing 7 files for legal compliance
* 18:37 longma: rolling group1 to 1.35.0-wmf.26
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:27 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/MobileFrontend/: SWAT: {{Gerrit|4e2a092}}: EditorGateway: Fix handling of null sectionId ([[phab:T249169|T249169]]) (duration: 01m 09s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.26/extensions/VisualEditor/modules/ve-mw: SWAT: {{Gerrit|94ded03}}: Fix issues with treating section "numbers" as integers ([[phab:T248795|T248795]]; [[phab:T248968|T248968]]; [[phab:T249112|T249112]]) (duration: 01m 10s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}} (duration: 03m 21s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:45 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@7650fbe]: Update mobileapps to {{Gerrit|61977bd7}}
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 16:53 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8] (duration: 00m 08s)
* 19:00 urbanecm: Morning B&C window completed
* 16:53 joal@deploy1001: Started deploy [analytics/refinery@5b254c8] (thin): Regular analytics weekly train THIN [analytics/refinery@5b254c8]
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 16:49 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/actions/Action.php: [[phab:T249162|T249162]] Partially revert 'WikiPage/Article split. Rely on Article inside Action' (duration: 01m 07s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 16:44 joal@deploy1001: Finished deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8] (duration: 13m 50s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 16:34 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 05s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:33 jforrester@deploy1001: sync-file aborted: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 00m 00s)
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:32 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T249014|T249014]] [siwiki] Change wgSitename to drop the ',' (duration: 01m 07s)
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 16:30 joal@deploy1001: Started deploy [analytics/refinery@5b254c8]: Regular analytics weekly train [analytics/refinery@5b254c8]
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:19 XioNoX: upgrade netflow4001's fastnetmon to 1.1.4 - [[phab:T240658|T240658]]
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 XioNoX: push new test switch config for cloudvirt2001 - [[phab:T248425|T248425]]
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 14:33 vgutierrez: Enable inbound TLSv1.3 in upload@codfw - [[phab:T170567|T170567]]
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 14:33 vgutierrez: Enable TLS Session tickets in codfw - [[phab:T245616|T245616]]
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 14:24 jbond42: updating bluez on ganeti and cloudvirt
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10865 and previous config saved to /var/cache/conftool/dbconfig/20200402-142338-marostegui.json
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10864 and previous config saved to /var/cache/conftool/dbconfig/20200402-141802-marostegui.json
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10863 and previous config saved to /var/cache/conftool/dbconfig/20200402-141335-marostegui.json
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1111 after schema change', diff saved to https://phabricator.wikimedia.org/P10862 and previous config saved to /var/cache/conftool/dbconfig/20200402-141149-marostegui.json
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 13:50 marostegui: Compress wbqc_constraints on testcommonswiki and commonswiki (empty tables) - [[phab:T248967|T248967]]
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 13:44 vgutierrez: update puppet compiler facts
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 marostegui: Deploy schema change on db1111
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111 for schema change', diff saved to https://phabricator.wikimedia.org/P10861 and previous config saved to /var/cache/conftool/dbconfig/20200402-133956-marostegui.json
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 13:32 gehel: OSM data reimport on maps2004 - [[phab:T249086|T249086]]
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 12:55 mutante: mw1390 - mw1399 - pooled and active but status "staged" in netbox, fixing to 'active'
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 12:52 mutante: mw1297 - is pooled and serving traffic but status "staged" in netbox. set to "active"
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 after schema change', diff saved to https://phabricator.wikimedia.org/P10858 and previous config saved to /var/cache/conftool/dbconfig/20200402-114020-marostegui.json
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 11:06 mutante: decom planet1001 ([[phab:T248863|T248863]])
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 10:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 10:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 10:19 marostegui: Deploy schema change on db1087, this will generate lag on s8 on wiki replicas
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for schema change', diff saved to https://phabricator.wikimedia.org/P10857 and previous config saved to /var/cache/conftool/dbconfig/20200402-101920-marostegui.json
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 10:17 elukey: set up TLS encryption for all pmacct instances on netflow* to Kafka Jumbo
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104 after schema change', diff saved to https://phabricator.wikimedia.org/P10856 and previous config saved to /var/cache/conftool/dbconfig/20200402-101747-marostegui.json
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 09:47 marostegui: Remove haproxy@10.64.37.14 from labsdb hosts - [[phab:T231280|T231280]] [[phab:T248944|T248944]]
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 09:44 gehel: CORRECTION: depool maps2004 for data reimport - [[phab:T249086|T249086]]
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 09:40 gehel: depool wdqs2004 for data reimport - [[phab:T249086|T249086]]
* 12:20 mutante: gerrit servers: disabling puppet
* 09:33 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided) (duration: 00m 18s)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 09:32 oblivian@deploy1001: Started deploy [docker-pkg/deploy@9f2ba2c]: (no justification provided)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 09:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@4f86d77]: (no justification provided) (duration: 00m 09s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 09:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@4f86d77]: (no justification provided)
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 08:51 marostegui: Deploy schema change db1104
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for schema change', diff saved to https://phabricator.wikimedia.org/P10854 and previous config saved to /var/cache/conftool/dbconfig/20200402-085057-marostegui.json
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092 after schema change', diff saved to https://phabricator.wikimedia.org/P10853 and previous config saved to /var/cache/conftool/dbconfig/20200402-085019-marostegui.json
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 08:28 gehel: repooling wdqs1006 - catched up on lag
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 08:22 vgutierrez: Enable inbound TLSv1.3 in upload@esams - [[phab:T170567|T170567]]
* 11:27 hashar: restarting Jenkins on contint2001
* 08:21 vgutierrez: Enable TLS Session tickets in esams - [[phab:T245616|T245616]]
* 11:27 hashar: restarting Jenkins on contint1001
* 07:45 moritzm: bounced ferm on ms-be1040
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:27 marostegui: Deploy schema change on db1092
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for schema change', diff saved to https://phabricator.wikimedia.org/P10850 and previous config saved to /var/cache/conftool/dbconfig/20200402-072730-marostegui.json
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10849 and previous config saved to /var/cache/conftool/dbconfig/20200402-072500-marostegui.json
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:49 marostegui: Deploy schema change on db1101:3318
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10848 and previous config saved to /var/cache/conftool/dbconfig/20200402-054931-marostegui.json
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:29 elukey: powercycle analytics1045 (host not responsive to ssh, weird chars showed in mgmt serial console)
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2020-04-01 ==
== 2021-07-31 ==
* 22:44 volker-e@deploy1001: Finished deploy [design/style-guide@4bfe647]: Deploy design/style-guide:  (duration: 00m 08s)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 22:43 volker-e@deploy1001: Started deploy [design/style-guide@4bfe647]: Deploy design/style-guide:
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 22:02 volans: forcing logrotate on netflow2001 to compress yesterday's logs
* 21:53 volans: force-rebooting ms-be1023, unresponsive - [[phab:T249174|T249174]]
* 21:50 volans: stopped and restarted kafkatee-webrequest.service on netflow2001, was in a restart loop
* 19:48 marxarelli: rollback of 1.35.0-wmf.26 from group1 ([[phab:T247773|T247773]]). blocked by [[phab:T249162|T249162]]
* 19:30 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback 1.35.0-wmf.26 from group1
* 19:21 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.26 (duration: 01m 06s)
* 19:20 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.26
* 19:18 marxarelli: promoting group1 to 1.35.0-wmf.26 to group1
* 17:21 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqord*' commit 'enable sampling on eqord Iac15379cc'
* 16:54 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕐☕ homer 'cr*eqdfw*' commit 'enable sampling on eqdfw Iac15379cc'
* 16:39 vgutierrez: pool cp2027 - [[phab:T248816|T248816]]
* 16:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:17 ariel@deploy1001: Finished deploy [dumps/dumps@21363c1]: page range prefetch fixup (duration: 00m 09s)
* 16:17 ariel@deploy1001: Started deploy [dumps/dumps@21363c1]: page range prefetch fixup
* 15:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:31 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
* 15:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 15:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:27 vgutierrez: depool & decommission cp20[16,19,23,27] - [[phab:T249125|T249125]]
* 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 after schema change', diff saved to https://phabricator.wikimedia.org/P10845 and previous config saved to /var/cache/conftool/dbconfig/20200401-152258-marostegui.json
* 15:11 herron: performing kafka-main rolling restarts to pick up security updates
* 14:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 vgutierrez: depool && decommission cp[2018,2020,2022,2024-2026].codfw.wmnet - [[phab:T249115|T249115]]
* 14:32 gehel: depooling wdqs1006 to allow catching up on lag
* 14:30 vgutierrez: pool cp2042 - [[phab:T248816|T248816]]
* 14:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:09 XioNoX: remove AS-path prepending in esams
* 13:47 XioNoX: remove AS-path prepending in eqsin
* 13:39 vgutierrez: pool cp2041 - [[phab:T248816|T248816]]
* 13:34 mutante: sodium (mirror): sudo -u mirror ftpsync to get Debian mirror updated (Icinga says it's old)
* 13:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:17 marostegui: Deploy schema change on db1099:3318
* 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 for schema change', diff saved to https://phabricator.wikimedia.org/P10843 and previous config saved to /var/cache/conftool/dbconfig/20200401-131719-marostegui.json
* 13:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:19 tgr@deploy1001: Synchronized wmf-config/config: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 06s)
* 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 tgr@deploy1001: Synchronized dblists/growthexperiments.dblist: SWAT: [[gerrit:584579{{!}}Sync growthexperiments dblist with actual state of wmgUseGrowthExperiments (T248844)]] (duration: 01m 05s)
* 12:17 XioNoX: restart nfacct on netflow4001 for kafka tls tests - [[phab:T248980|T248980]]
* 12:15 vgutierrez: depool & decommission cp2013 - [[phab:T249088|T249088]]
* 12:14 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: re-sync (duration: 01m 06s)
* 12:12 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:585059{{!}}Enable password-reset-update on all other than Wikipedias (T245791)]] (duration: 01m 07s)
* 12:09 marostegui: Deploy schema change on db1116:3318
* 12:05 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons take 2 (duration: 01m 08s)
* 12:04 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Revert enabling WikibaseQualityConstraints on Commons (duration: 01m 05s)
* 11:54 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]; take II) (duration: 01m 05s)
* 11:53 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4968501}}: Restrict short URL management log to stewards ([[phab:T221073|T221073]]) (duration: 01m 07s)
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php:  [SDC] Enable WikibaseQualityConstraints on Commons take II (duration: 01m 06s)
* 11:44 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] Enable WikibaseQualityConstraints on Commons (duration: 01m 18s)
* 11:20 cormacparle__: created table wbqc_constraints on commonswiki
* 11:03 jbond42: install bluez update on ganeti-canary and cloudvirt/cloudcontrol-dev
* 11:01 mutante: planet1001 - reinstall OS to test install_server switch, ATS switched to planet1002 earlier
* 10:47 marostegui: Deploy schema change on dbstore1005:3318
* 10:25 vgutierrez: pool cp2040 - [[phab:T248816|T248816]]
* 10:16 oblivian@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=canary
* 09:55 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 09:37 marostegui: Deploy schema change on s8 codfw, this will generate lag on codfw
* 09:35 XioNoX: Update install servers IPs (dhcp helpers + firewall rules) - [[phab:T224576|T224576]]
* 09:34 mutante: install_servers: DHCP_relay in routers and TFTP server in DHCP server config have been switched from install1002/2002 to install1003/2003 - doing a test install, but if any issues report on [[phab:T224576|T224576]]
* 09:26 marostegui: last entry was for db2093
* 09:26 marostegui: Downgrade mariadb package from 10.4.12-2 to 10.4.12-1
* 09:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:05 mutante: planet - the backend server has been switched from planet1001 (stretch) to planet1002 (buster) - [[phab:T247651|T247651]]
* 08:46 mutante: deneb, boron: systemctl reset-failed to clear up systemd state alerts
* 08:43 marostegui: Stop haproxy on dbproxy1010 [[phab:T248944|T248944]]
* 08:37 jynus: restart bacula at backup1001
* 08:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:28 vgutierrez: depool & decommission cp2017 - [[phab:T249084|T249084]]
* 08:21 vgutierrez: pool cp2039 - [[phab:T248816|T248816]]
* 08:09 marostegui: Deploy schema change on db1138 (s4 primary master)
* 08:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 after schema change', diff saved to https://phabricator.wikimedia.org/P10841 and previous config saved to /var/cache/conftool/dbconfig/20200401-071339-marostegui.json
* 07:12 vgutierrez: pool cp2038 - [[phab:T248816|T248816]]
* 06:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 06:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:36 vgutierrez: depool & decommission cp2012 - [[phab:T249080|T249080]]
* 06:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui: Deploy schema change on db1121 (this will create lag on s4 labs)
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for schema change', diff saved to https://phabricator.wikimedia.org/P10840 and previous config saved to /var/cache/conftool/dbconfig/20200401-053827-marostegui.json
* 00:39 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Update http and prot rel links to https, fix link to sitelist in MW Core (duration: 01m 06s)
* 00:12 reedy@deploy1001: Synchronized docroot/mediawiki.org/xml/: Add export-0.11 (duration: 01m 05s)


== 2020-03-31 ==
== 2021-07-30 ==
* 22:23 marxarelli: group0 to 1.35.0-wmf.26 ([[phab:T247773|T247773]]); no rise in error rates following redeployment
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:13 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.26
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:07 dduvall@deploy1001: rebuilt and synchronized wikiversions files: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:54 dduvall@deploy1001: sync aborted: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]]) (duration: 07m 31s)
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:47 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 ([[phab:T247773|T247773]])
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:46 jforrester@deploy1001: Synchronized php-1.35.0-wmf.26/includes/user/UserNameUtils.php: [[phab:T249045|T249045]] Use wfMessage in UserNameUtils::isUsable for now (duration: 00m 58s)
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:05 eileen: process-control config revision is {{Gerrit|f80d248113}} - (catch up dedupe now off - fyi MBeat )
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 20:59 hashar: contint1001: manually reverted /lib/systemd/system/jenkins.service
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 20:51 hashar: Restarting Jenkins for new CSP rules # [[phab:T245658|T245658]]
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 20:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rolling back 1.35.0-wmf.26 testwiki deployment following significant increase in error rate (cc [[phab:T247773|T247773]])
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:14 marxarelli: correction: RequestContext::getLanguage errors are for testwiki deployment, pre group0
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:08 marxarelli: a slew of "ErrorException from line 334 of /srv/mediawiki/php-1.35.0-wmf.26/includes/context/RequestContext.php: PHP Warning: Recursion detected in RequestContext::getLanguage" after group0 deployment (cc [[phab:T247773|T247773]])
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:04 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache (duration: 142m 48s)
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:20 ariel@deploy1001: Finished deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly (duration: 00m 04s)
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 19:20 ariel@deploy1001: Started deploy [dumps/dumps@713c297]: more filelist methods cleanup, sort prefetch possible files properly
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:08 ariel@deploy1001: Finished deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date (duration: 00m 05s)
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 ariel@deploy1001: Started deploy [dumps/dumps@8376c62]: bring snapshot1010 up to date
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:42 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.26 and rebuild l10n cache
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 17:40 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.23 (duration: 26m 51s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:38 elukey: restart elasticsearch_6@cloudelastic-chi-eqiad.service on cloudelastic1001 to see if it recovers from a trashing/gc state - [[phab:T231517|T231517]]
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 16:30 marxarelli: 1.35.0-wmf.26 was branched at {{Gerrit|bec758b668aaa57fc259a1d0ecf3b35340d2661b}} for [[phab:T247773|T247773]]
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 16:24 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 00s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 16:15 vgutierrez: pool cp2037 - [[phab:T248816|T248816]]
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:35 mutante: decom mw1254 through mw1258 (last remaining old servers in rack D5, depooled a while ago and average response time is again under 200ms) [[phab:T247780|T247780]]
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 15:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 15:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 15:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 15:26 vgutierrez: depool & decommission cp2010 - [[phab:T249002|T249002]]
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 15:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 15:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245794|T245794]] Enable DiscussionTools as a beta feature on four wikis (duration: 01m 00s)
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 15:05 cdanis: cr1-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 15:01 cdanis: cr2-eqiad: commit flex-flow-sizing [[phab:T248394|T248394]]
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:43 vgutierrez: pool cp2036 - [[phab:T248816|T248816]]
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:21 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[4-8].eqiad.wmnet
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1091 after schema change', diff saved to https://phabricator.wikimedia.org/P10834 and previous config saved to /var/cache/conftool/dbconfig/20200331-141459-marostegui.json
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 14:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[4-8].eqiad.wmnet
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:31 vgutierrez: Enable TLS Session tickets in eqsin - [[phab:T245616|T245616]]
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:05 XioNoX: update nat on pfw3-codfw - [[phab:T248906|T248906]]
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 moritzm: installing libsndfile security updates on stretch
* 12:49 _joe_: switching all appserver canaries to envoy
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 12:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 12:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 12:45 marostegui: Deploy schema change on db1091
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 for schema change', diff saved to https://phabricator.wikimedia.org/P10833 and previous config saved to /var/cache/conftool/dbconfig/20200331-124452-marostegui.json
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 12:34 _joe_: transitioning mw1261 to envoy
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 12:23 vgutierrez: rolling upgrade of ATS to version 8.0.6-1wm5 - [[phab:T248938|T248938]]
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 11:30 Lucas_WMDE: EU SWAT done
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 11:30 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]], take II (duration: 00m 57s)
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 11:29 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:584874{{!}}Disable TwoColConflict talk page workflow (T230231)]] (duration: 00m 58s)
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 11:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]], take II (duration: 00m 59s)
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:584574{{!}}Enable ContentTranslation in Lithuanian Wikipedia as a default tool (T248179)]] (duration: 01m 00s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 10:46 _joe_: disabled puppet on canary appservers, potentially dangerous change ahead
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084 after schema change', diff saved to https://phabricator.wikimedia.org/P10831 and previous config saved to /var/cache/conftool/dbconfig/20200331-101953-marostegui.json
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 10:03 XioNoX: add BGP to AS41327 in AMS-IX
* 09:49 XioNoX: push homer diffs to mr1-eqsin
* 09:36 XioNoX: push homer diffs to mr1-eqiad
* 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 09:05 vgutierrez: upload trafficserver 8.0.5-1wm6 to apt.wm.o (buster) - [[phab:T248938|T248938]]
* 09:00 vgutierrez: depool & decommission cp2011 - [[phab:T248950|T248950]]
* 08:44 vgutierrez: pool cp2035 - [[phab:T248816|T248816]]
* 08:31 mutante: signed puppet cert for planet1002.eqiad.wmnet
* 08:29 marostegui: Depool db1084 for schema change
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for schema change', diff saved to https://phabricator.wikimedia.org/P10829 and previous config saved to /var/cache/conftool/dbconfig/20200331-082904-marostegui.json
* 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1081 after schema change', diff saved to https://phabricator.wikimedia.org/P10828 and previous config saved to /var/cache/conftool/dbconfig/20200331-082711-marostegui.json
* 08:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:08 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:01 XioNoX: delete unused ROA for ARIN v4 prefixes - [[phab:T235886|T235886]]
* 07:49 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:17 vgutierrez: pool cp2034 - [[phab:T248816|T248816]]
* 07:16 marostegui: Deploy schema change on db1081
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for schema change', diff saved to https://phabricator.wikimedia.org/P10827 and previous config saved to /var/cache/conftool/dbconfig/20200331-071547-marostegui.json
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10826 and previous config saved to /var/cache/conftool/dbconfig/20200331-071401-marostegui.json
* 06:48 marostegui: Deploy schema change on db1103:3314
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10825 and previous config saved to /var/cache/conftool/dbconfig/20200331-064707-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3314 after schema change', diff saved to https://phabricator.wikimedia.org/P10824 and previous config saved to /var/cache/conftool/dbconfig/20200331-064627-marostegui.json
* 05:55 marostegui: Drop nova and nova_api from m5 master (db1133) - [[phab:T248313|T248313]]
* 05:55 kart_: Updated cxserver to 2020-03-30-145349-production ([[phab:T248578|T248578]])
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 05:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 05:53 vgutierrez: depool && decommission cp2007 - [[phab:T248941|T248941]]
* 05:48 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:46 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:46 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 05:26 marostegui: Deploy schema change on db1097:3314
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 for schema change', diff saved to https://phabricator.wikimedia.org/P10822 and previous config saved to /var/cache/conftool/dbconfig/20200331-051354-marostegui.json
* 00:26 eileen: civicrm revision changed from {{Gerrit|cf2e2c11c3}} to {{Gerrit|524b162174}}, config revision is {{Gerrit|708198a154}}


== 2020-03-30 ==
== 2021-07-29 ==
* 23:30 cdanis: cr3-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:20 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16 refs [[phab:T281157|T281157]]
* 23:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Alphabetize wikis in each GrowthExperiments settings (duration: 00m 58s)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:16 cdanis: cr2-esams: commit flex-flow-sizing [[phab:T248394|T248394]]
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:08 cdanis: cdanis@cr3-knams# commit comment "sensible flow table sizes [[phab:T248394|T248394]]"
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:56 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 58s)
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 22:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Provide wmgSiteLogoIcon (duration: 00m 57s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Set wmgSiteLogoIcon for each project family and four special wikis (duration: 00m 58s)
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:50 jforrester@deploy1001: Synchronized wmf-config/mobile.php: Set wgMobileFrontendLogo from wgLogos['icon'] if set (duration: 00m 59s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 22:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 00m 57s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:36 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Split wgLogos setting into wmgSiteLogo1x etc. (duration: 00m 59s)
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 22:33 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Construct wgLogos in CommonSettings so that projects can inherit values (duration: 01m 02s)
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 19:55 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:36 ejegg: updated payments listener (standalone SmashPig) from {{Gerrit|dc0c6b208b}} to {{Gerrit|d80e4c5abd}}
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 15:32 vgutierrez: pool cp2033 - [[phab:T248816|T248816]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 15:25 jeh: add icinga 2h downtime and soft reset iDRAC on labstore1005.mgmt.eqiad.wmnet [[phab:T247965|T247965]]
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:53 vgutierrez: depool & decommission cp2008 - [[phab:T248864|T248864]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:23 vgutierrez: pool cp2032 - [[phab:T248816|T248816]]
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:01 vgutierrez: depool & decommission cp2006 - [[phab:T248856|T248856]]
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 13:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:45 vgutierrez: pool cp2031 - [[phab:T248816|T248816]]
* 14:11 vgutierrez: restart pybal on lvs2009
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:09 vgutierrez: restart pybal on lvs2010
* 13:07 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:07 vgutierrez: restart pybal on lvs2008
* 13:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:59 vgutierrez: restart pybal on lvs1014
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:55 vgutierrez: restart pybal on lvs1015
* 12:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 13:52 _joe_: restarting pybal on lvs1016
* 12:53 vgutierrez: depool & decommission cp2005 - [[phab:T248848|T248848]]
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:26 cdanis: cdanis@re0.cr2-codfw# set chassis fpc 5 inline-services flex-flow-sizing    cdanis@re0.cr2-codfw# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 12:21 vgutierrez: depool & decommission cp2004 - [[phab:T248824|T248824]]
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 12:03 XioNoX: delete unused ROA for ARIN v6 prefixes - [[phab:T235886|T235886]]
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 11:59 XioNoX: delete unused ROAs for RIPE prefixes - [[phab:T235886|T235886]]
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 11:42 mutante: miscweb2002 - race condition with apache2 mpm and php7.3 module met - a2dismond mpm_event ; systemctl restart apache2 ; puppet agent -tv (also see [[phab:T196968|T196968]], https://gerrit.wikimedia.org/r/c/operations/puppet/+/451206) [[phab:T247887|T247887]]
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 11:37 mutante: miscweb2002 - installed OS, added to puppet, added role and  ... sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config ([[phab:T247648|T247648]])
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 11:30 marostegui: Deploy schema change on dbstore1004:3314
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 11:22 XioNoX: delete ARIN allocations from RIPE's IRR - [[phab:T235886|T235886]]
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 11:11 Urbanecm: EU SWAT done
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]; take II) (duration: 00m 58s)
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ac7e625}}: Add collections.nmnh.si.edu to $wgCopyUploadsDomains ([[phab:T248659|T248659]]) (duration: 00m 58s)
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:08 vgutierrez: pool cp2030 - [[phab:T248816|T248816]]
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]; take II) (duration: 00m 59s)
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|c8c06f9}}: Add 3 additional namespaces and assoicated talk pages to trwiktionary ([[phab:T248734|T248734]]) (duration: 00m 59s)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 10:43 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 10:34 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 10:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:33 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:52 moritzm: restarting Tomcat on idp-test
* 09:59 hoo: Temporary modified dumpsgen's crontab on snapshot1008 so that the Wikidata JSON dumps start at 9:59 UTC today ([[phab:T248612|T248612]])
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 09:56 hoo@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/Wikibase/repo/maintenance/DumpEntities.php: DumpEntities: Fix DB group default override ([[phab:T248612|T248612]]) (duration: 01m 02s)
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 09:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 vgutierrez: pool cp2029 - [[phab:T248816|T248816]]
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 08:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 vgutierrez: depool & decommission cp2002 - [[phab:T248818|T248818]]
* 07:48 marostegui: Run cloudcontrol1003:~# wmcs-wikireplica-dns to promote dbproxy1018 to wikireplicas active proxy [[phab:T231520|T231520]]
* 07:40 marostegui: Replace dbproxy1010 with dbproxy1011 for wiki replicas, analytics - [[phab:T231520|T231520]]
* 07:28 marostegui: Deploy schema change on labswiki (wikitech) - [[phab:T248333|T248333]]
* 07:26 marostegui: Deploy schema change on s4 codfw, this will generate lag on codfw - [[phab:T248333|T248333]]
* 07:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission
* 07:10 vgutierrez: depool and decommission cp2001 - [[phab:T248815|T248815]]
* 06:52 vgutierrez: pool cp2028 - [[phab:T247340|T247340]]
* 06:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10813 and previous config saved to /var/cache/conftool/dbconfig/20200330-062858-marostegui.json
* 06:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui: Deploy schema change on db1074 with replication, this will generate lag on s2 labs
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P10812 and previous config saved to /var/cache/conftool/dbconfig/20200330-060338-marostegui.json
* 05:40 vgutierrez: pool cp2027 - [[phab:T247340|T247340]]
* 05:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 vgutierrez: Enable TLS Session tickets in ulsfo - [[phab:T245616|T245616]]
* 04:32 vgutierrez: upgrade ATS to version 8.0.6-1wm4 on ulsfo - [[phab:T245616|T245616]]


== 2020-03-29 ==
== 2021-07-28 ==
* 08:24 elukey: powercycle elastic1059 - mgmt/serial console stuck, no ssh - racadm getsel shows a lot of OEM errors occurred, nothing specific
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-03-28 ==
== 2021-07-27 ==
* 16:54 elukey: restart yarn on analytics1071
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 12:05 vgutierrez: preemptive restart of ats-tls on cp1081 and cp3062 - [[phab:T248736|T248736]]
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 11:32 vgutierrez: restart ats-tls on cp1077 - [[phab:T248736|T248736]]
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 08:34 vgutierrez: pool cp1089
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 08:30 vgutierrez: restarting ats-tls on cp1089
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-03-27 ==
== 2021-07-26 ==
* 20:51 ejegg: updated payments-wiki from {{Gerrit|db618f429d}} to {{Gerrit|1640f5e21e}}
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 15:15 andrew@deploy1001: Finished deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens (duration: 03m 35s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 15:12 andrew@deploy1001: Started deploy [horizon/deploy@33e67f9]: fix Identity->Projects with keystone Queens
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129 after schema change', diff saved to https://phabricator.wikimedia.org/P10807 and previous config saved to /var/cache/conftool/dbconfig/20200327-144125-marostegui.json
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P10806 and previous config saved to /var/cache/conftool/dbconfig/20200327-142240-marostegui.json
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 14:19 moritzm: updating linux-image-4.9.0-11-amd64 where applicable
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10805 and previous config saved to /var/cache/conftool/dbconfig/20200327-133022-marostegui.json
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 for schema change', diff saved to https://phabricator.wikimedia.org/P10804 and previous config saved to /var/cache/conftool/dbconfig/20200327-130706-marostegui.json
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10803 and previous config saved to /var/cache/conftool/dbconfig/20200327-130542-marostegui.json
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 12:49 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --interface-admin
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10802 and previous config saved to /var/cache/conftool/dbconfig/20200327-122144-marostegui.json
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 after schema change', diff saved to https://phabricator.wikimedia.org/P10801 and previous config saved to /var/cache/conftool/dbconfig/20200327-122058-marostegui.json
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P10800 and previous config saved to /var/cache/conftool/dbconfig/20200327-120234-marostegui.json
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-backend,name=restbase202[123].codfw.wmnet
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[123].codfw.wmnet
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 11:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2023.codfw.wmnet
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2022.codfw.wmnet
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 11:44 oblivian@puppetmaster1001: conftool action : edit; selector: dc=codfw,cluster=restbase,service=restbase-ssl,name=restbase202[1].codfw.wmnet
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 11:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=restbase,service=restbase,name=restbase2021.codfw.wmnet
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:55 mutante: revoke puppet cert webserver-misc-apps.discovery.wmnet and recreate with additional SANs for new VMs
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:45 mutante: miscweb1002 - upload and unpack RackTables-0.21.4 ([[phab:T247646|T247646]] [[phab:T247648|T247648]])
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:28 marostegui: Alter db2125 s2 to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:12 mutante: miscweb1002 - sed -i 's/tin.eqiad/deployment.eqiad/g' /srv/deployment/iegreview/iegreview-cache/.config  [[phab:T247648|T247648]]
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:04 vgutierrez: upload trafficserver 8.0.6-1wm4 to apt.wm.o (buster) - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:03 mutante: sodium - find /srv/mirrors/debian/ -user root -exec chown -h mirror:mirror <nowiki>{</nowiki><nowiki>}</nowiki> \;  (-h to also fix symbolic links); sudo -u mirror ftpsync ([[phab:T248660|T248660]])
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 10:02 marostegui: Alter db2084:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:01 marostegui: Alter db1096:3315 enwikivoyage.page to set page_restrictions to default NULL - [[phab:T248333|T248333]]
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 09:37 mutante: sodium - running ftpsync as user mirror ([[phab:T248660|T248660]])
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 09:36 mutante: sodium fixing root owned files in /srv/mirrors/debian to be owned by mirror:mirror ([[phab:T248660|T248660]])
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P10799 and previous config saved to /var/cache/conftool/dbconfig/20200327-093214-marostegui.json
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10798 and previous config saved to /var/cache/conftool/dbconfig/20200327-093106-marostegui.json
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:58 marostegui: Deploy schema change on s2 codfw - this will generate lag on s2 codfw - [[phab:T248333|T248333]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 07:36 elukey: execute 'rm /etc/logrotate.d/ceph-common' on cloudvirt[1,2]* and cloudcontrol* to stop daily cronspam (file not in the puppet catalog anymore)
* 06:39 moritzm: installing krb5 security updates
* 07:32 moritzm: installing grub2 updates from Stretch point release
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P10796 and previous config saved to /var/cache/conftool/dbconfig/20200327-072334-marostegui.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P10795 and previous config saved to /var/cache/conftool/dbconfig/20200327-070224-marostegui.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082 after schema change', diff saved to https://phabricator.wikimedia.org/P10794 and previous config saved to /var/cache/conftool/dbconfig/20200327-070014-marostegui.json
* 06:31 marostegui: Deploy schema change on db1082, this will generate lag on s5 labs
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 for schema change', diff saved to https://phabricator.wikimedia.org/P10793 and previous config saved to /var/cache/conftool/dbconfig/20200327-063042-marostegui.json


== 2020-03-26 ==
== 2021-07-24 ==
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]; take II) (duration: 00m 57s)
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|ce63a4e}}: Enable wmgUseFooterContactLink for cswiki ([[phab:T248584|T248584]]) (duration: 00m 58s)
* 22:51 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/user/UserRightsProxy.php: {{Gerrit|I9121f5aae}} (4/4) (duration: 00m 58s)
* 22:50 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/search/SearchMySQL.php: {{Gerrit|I9121f5aae}} (3/4) (duration: 00m 58s)
* 22:48 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/objectcache/SqlBagOStuff.php: {{Gerrit|I9121f5aae}} (2/4) (duration: 00m 58s)
* 22:44 krinkle@deploy1001: Synchronized php-1.35.0-wmf.25/includes/jobqueue/jobs/RecentChangesUpdateJob.php: {{Gerrit|I9121f5aae}} (1/4) (duration: 01m 00s)
* 22:05 ejegg: updated fundraising CiviCRM from {{Gerrit|f1cb23e809}} to {{Gerrit|cf2e2c11c3}}
* 21:43 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/MachineVision: Fix: Stop sorting label suggestions by Wikidata ID in ApiQueryImageLabels (duration: 01m 00s)
* 21:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:32 cdanis: cdanis@re0.cr1-eqsin# set chassis afeb slot 0 inline-services flex-flow-sizing    cdanis@re0.cr1-eqsin# commit comment "flex-flow-sizing [[phab:T248394|T248394]]"
* 21:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:27 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}} (duration: 03m 07s)
* 21:24 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f34260c]: Update mobileapps to {{Gerrit|3f30f20c}}
* 21:15 cdanis: repool ulsfo
* 21:12 cdanis: applied flow-table-size configuration to cr4-ulsfo which did not need a reboot to apply it [[phab:T248394|T248394]]
* 20:51 cdanis: cdanis@cr3-ulsfo> request system reboot
* 20:36 cdanis: depool ulsfo
* 16:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:34 XioNoX: stop exchanging full BGP view between eqiad and codfw - [[phab:T246721|T246721]]
* 16:19 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 XioNoX: stop advertising 208.80.152.0/22 from eqiad - [[phab:T246721|T246721]]
* 16:15 mutante: signing puppet cert for miscweb1002, installed buster, added insetup role ([[phab:T247887|T247887]])
* 16:15 ebernhardson: set cloudelastic-chi wikidatawiki_content to 0 replicas while reindexing
* 16:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:14 moritzm: rebooting mw2150 for some tests
* 16:12 XioNoX: stop advertising 2620:0:860::/46 from eqiad - [[phab:T246721|T246721]]
* 16:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:11 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:10 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:58 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:53 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:51 moritzm: installing grub2 updates from Stretch point release
* 15:49 XioNoX: start advertising 208.80.154.0/23 from eqiad - [[phab:T246721|T246721]]
* 15:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:40 XioNoX: start advertising 2620:0:861::/48 from eqiad - [[phab:T246721|T246721]]
* 15:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:10 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 mutante: [[phab:T247887|T247887]] - create Ganeti VM miscweb1002.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row C with 1 vCPUs, 2GB of RAM, 20GB of disk in the private network.
* 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:59 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:50 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:26 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:23 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P10787 and previous config saved to /var/cache/conftool/dbconfig/20200326-135625-marostegui.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P10786 and previous config saved to /var/cache/conftool/dbconfig/20200326-132940-marostegui.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10785 and previous config saved to /var/cache/conftool/dbconfig/20200326-130122-marostegui.json
* 12:57 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-main to use envoy [[phab:T244843|T244843]] (duration: 01m 07s)
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10784 and previous config saved to /var/cache/conftool/dbconfig/20200326-123302-marostegui.json
* 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P10783 and previous config saved to /var/cache/conftool/dbconfig/20200326-123157-marostegui.json
* 12:25 mutante: analytics1028 - performing a puppet change on every run (all other hosts doing this were fixed just recently)
* 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P10782 and previous config saved to /var/cache/conftool/dbconfig/20200326-121859-marostegui.json
* 11:38 awight: EU SWAT done
* 11:37 awight@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/TwoColConflict: SWAT: [[gerrit:583576{{!}}Two hotfixes for guided tour (T248465)]] (duration: 01m 07s)
* 11:25 mutante: sodium - running ftpsync to get Debian mirror in sync
* 11:23 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 05s)
* 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T231517|T231517]]: [cirrus] force cloudelastic replica count to 1 (duration: 01m 06s)
* 11:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/ContentTranslation/modules/ui/mw.cx.ui.Categories.js: SWAT: {{Gerrit|1ea6bad}}: Allow publishing to continue even with broken categories ([[phab:T248302|T248302]]) (duration: 01m 07s)
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|d1bb0b1}}: Removed expired throttle.php entries (duration: 01m 09s)
* 11:00 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:58 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:54 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:16 XioNoX: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 - [[phab:T207753|T207753]]
* 09:50 elukey: reboot stat1008 - gpu + drivers in a weird state after multiple tests
* 09:00 XioNoX: push v4 conditional advertising on cr3-knams - [[phab:T236785|T236785]]
* 08:44 marostegui: Deploy schema change on s5 codfw, lag will show up on codfw - [[phab:T248333|T248333]]
* 08:27 XioNoX: troubleshot v6 conditional advertisement from cr3-knams - [[phab:T236785|T236785]]
* 07:58 XioNoX: remove BGP session to AS8001 in eqiad (down and not replying to email)
* 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085 after schema change', diff saved to https://phabricator.wikimedia.org/P10781 and previous config saved to /var/cache/conftool/dbconfig/20200326-074033-marostegui.json
* 07:31 marostegui: Deploy schema change on db1085, lag will appear on s6 on labs
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 for schema change', diff saved to https://phabricator.wikimedia.org/P10780 and previous config saved to /var/cache/conftool/dbconfig/20200326-073048-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1093 after schema change', diff saved to https://phabricator.wikimedia.org/P10779 and previous config saved to /var/cache/conftool/dbconfig/20200326-070746-marostegui.json
* 06:59 marostegui: Deploy schema change on db1093
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for schema change', diff saved to https://phabricator.wikimedia.org/P10778 and previous config saved to /var/cache/conftool/dbconfig/20200326-065929-marostegui.json
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1088 after schema change', diff saved to https://phabricator.wikimedia.org/P10777 and previous config saved to /var/cache/conftool/dbconfig/20200326-065814-marostegui.json
* 06:48 marostegui: Deploy schema change on db1088
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for schema change', diff saved to https://phabricator.wikimedia.org/P10776 and previous config saved to /var/cache/conftool/dbconfig/20200326-064748-marostegui.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10775 and previous config saved to /var/cache/conftool/dbconfig/20200326-064648-marostegui.json
* 06:39 marostegui: Deploy schema change on db1098:3316
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10774 and previous config saved to /var/cache/conftool/dbconfig/20200326-063844-marostegui.json
* 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P10773 and previous config saved to /var/cache/conftool/dbconfig/20200326-063633-marostegui.json
* 06:26 marostegui: Deploy schema change on db1096:3316
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P10772 and previous config saved to /var/cache/conftool/dbconfig/20200326-062631-marostegui.json
* 06:22 marostegui: Rename nova and nova_api tables on db1117:3325 - [[phab:T248313|T248313]]
* 00:06 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable Special:Investigate on testwiki ([[phab:T247645|T247645]]) (duration: 03m 14s)


== 2020-03-25 ==
== 2021-07-23 ==
* 23:49 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Add investigate to $wgAvailableRights ([[phab:T247645|T247645]]) (duration: 03m 16s)
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:42 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Retry because mw1251 timed out, and it is a proxy (duration: 03m 15s)
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:38 catrope@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/CheckUser/: Add new investigate right ([[phab:T247645|T247645]]) (duration: 03m 17s)
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:21 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:15 effie: enable puppet on mc-gp* hosts
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 22:10 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:05 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 22:05 rlazarus: updating eventgate-logging-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 22:00 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 21:59 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints (duration: 02m 57s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 21:56 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4] (dev-cluster): Remove experimental PCS endpoints
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 21:54 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 21:46 urandom: dropping unused Cassandra keyspaces -- [[phab:T248018|T248018]]
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 21:45 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 21:44 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 21:44 rlazarus: updating eventgate-analytics-external to envoy 1.13.1 [[phab:T246868|T246868]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 21:39 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 21:27 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 21:16 rlazarus: holding off on updating eventgate-analytics until EU time, to check on unexpected helmfile diffs [[phab:T246868|T246868]]
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 21:11 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 21:10 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 21:07 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 21:07 rlazarus: updating eventgate-analytics to envoy 1.13.1 [[phab:T246868|T246868]]
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 20:36 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 20:32 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 20:22 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 20:22 rlazarus: updating cxserver to envoy 1.13.1 [[phab:T246868|T246868]]
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 20:19 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 20:19 rlazarus: updating citoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 20:16 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 20:01 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 19:36 hasharDinner: Jenkins restarted on all machines
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:30 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 19:29 rlazarus: updating eventstreams to envoy 1.13.1 [[phab:T246868|T246868]]
* 19:28 twentyafterfour: group1 looks good after deploying wmf.25 refs [[phab:T233873|T233873]]
* 19:27 hashar: upgrading Jenkins # [[phab:T248122|T248122]]
* 19:26 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.25  refs [[phab:T233873|T233873]]
* 19:26 twentyafterfour: scap sync-proxies failed on mw1251
* 18:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]] (duration: 14m 00s)
* 18:39 ppchelko@deploy1001: Started deploy [restbase/deploy@a1c3be4]: Add restbase202[123] [[phab:T244178|T244178]]
* 18:39 ppchelko@deploy1001: Finished deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints (duration: 14m 28s)
* 18:24 ppchelko@deploy1001: Started deploy [restbase/deploy@777b881]: Remove experimental PCS endpoints
* 18:21 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: re-sync, mw1251 failed (duration: 03m 18s)
* 18:13 tgr@deploy1001: Synchronized php-1.35.0-wmf.25/extensions/GrowthExperiments/modules/homepage/: SWAT: [[gerrit:583393{{!}}Mentorship module: Update for root screen refactor (T248422)]] (duration: 03m 23s)
* 18:06 ppchelko@deploy1001: Finished deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints (duration: 01m 40s)
* 18:05 ppchelko@deploy1001: Started deploy [changeprop/deploy@4bdf55b]: Stop rerendering experimental PCS endpoints
* 17:43 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:38 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 17:33 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 16:50 moritzm: installing python-bleach security updates
* 16:47 moritzm: updated jenkins packages on apt.wikimedia.org to 2.222.1
* 16:33 rzl@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:32 sukhe: upload cescout 0.1.0-1 to apt.wm.o (buster) - [[phab:T247273|T247273]]
* 16:17 rzl@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'blubberoid' for release 'production' .
* 16:15 rzl@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'blubberoid' for release 'staging' .
* 16:07 rlazarus: updating blubberoid to envoy 1.13.1 [[phab:T246868|T246868]]
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2115 after reimage to Buster', diff saved to https://phabricator.wikimedia.org/P10767 and previous config saved to /var/cache/conftool/dbconfig/20200325-152148-marostegui.json
* 15:14 moritzm: installing deneb.codfw.wmnet [[phab:T248165|T248165]]
* 14:51 cdanis: repool codfw [[phab:T248394|T248394]]
* 14:46 mutante: closed port 80 for caching servers on misc backends https://gerrit.wikimedia.org/r/q/topic:%22applayer-tls%22+(status:open%20OR%20status:merged) as final step per service on [[phab:T210411|T210411]]
* 14:39 mutante: static microsites (annual.wikimedia.org, research.wikimedia.org, static-bugzilla etc). closed port 80 for caching servers, finalizing switch to https behind caching servers
* 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:48 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:26 _joe_: cumin A:puppetmaster 'apt-get -y install puppet-common'
* 13:03 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 marostegui: Deploy schema change on db1139:3316
* 12:45 marostegui: Stop MySQL on db2115 for reimage to buster
* 11:50 cdanis: cr1-codfw: `set chassis fpc 5 inline-services flex-flow-sizing` and `request chassis fpc restart slot 5` [[phab:T248394|T248394]]
* 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2115 for upgrade', diff saved to https://phabricator.wikimedia.org/P10763 and previous config saved to /var/cache/conftool/dbconfig/20200325-114655-marostegui.json
* 11:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:37 mutante: decom mw1250 - mw1253
* 11:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:35 cdanis: depool codfw for router maintenance [[phab:T248394|T248394]]
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:32 mutante: decom mw1232 - mw1235
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 11:27 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw125[0-3].eqiad.wmnet
* 11:26 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[2-5].eqiad.wmnet
* 11:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:21 Urbanecm: EU SWAT done
* 11:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[2-5].eqiad.wmnet
* 11:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw125[0-3].eqiad.wmnet
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|59412db}}: Add gwtoolset to available rights to allow granting to global groups (duration: 01m 07s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|7b8d7c5}}: TwoColConflict: Limited default deployment CommonSettings.php ([[phab:T244863|T244863]]) (duration: 01m 06s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]; take II) (duration: 01m 06s)
* 11:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|81cda0f}}: TwoColConflict: Limited default deployment InitialiseSettings.php ([[phab:T244863|T244863]]) (duration: 01m 17s)
* 11:08 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1091 load, increase main traffic on all other s4 instances', diff saved to https://phabricator.wikimedia.org/P10762 and previous config saved to /var/cache/conftool/dbconfig/20200325-110821-jynus.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1137', diff saved to https://phabricator.wikimedia.org/P10761 and previous config saved to /var/cache/conftool/dbconfig/20200325-105503-marostegui.json
* 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10760 and previous config saved to /var/cache/conftool/dbconfig/20200325-103938-marostegui.json
* 10:37 XioNoX: change aggregate policy for 2620:0:862::/48 on cr3-knams - [[phab:T236785|T236785]]
* 10:19 XioNoX: change aggregate policy for v4 prefixes on cr2-eqdfw - [[phab:T236785|T236785]]
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 10:04 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 09:56 XioNoX: change aggregate policy for 2620:0:860::/46 on cr2-eqdfw - [[phab:T236785|T236785]]
* 09:54 vgutierrez: Enable inbound TLSv1.3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:27 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:23 vgutierrez: upgrade ATS to 8.0.6-1wm3 on upload@eqsin - [[phab:T170567|T170567]]
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10759 and previous config saved to /var/cache/conftool/dbconfig/20200325-091421-marostegui.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1137', diff saved to https://phabricator.wikimedia.org/P10758 and previous config saved to /var/cache/conftool/dbconfig/20200325-090227-marostegui.json
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:38 marostegui: Reimage db1137
* 08:18 marostegui: Reboot db1117 for full-upgrade
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 08:15 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 08:14 _joe_: upgrading all eventgate-main to envoy 1.13.1 [[phab:T246868|T246868]]
* 08:12 marostegui: Stop all mysql daemons on db1117
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 07:50 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 07:42 XioNoX: reboot scs-eqsin for CPU usage
* 07:20 jmm@cumin2001: START - Cookbook sre.ganeti.makevm
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 for upgrade', diff saved to https://phabricator.wikimedia.org/P10757 and previous config saved to /var/cache/conftool/dbconfig/20200325-070946-marostegui.json
* 06:57 marostegui: Deploy schema change on db2129 (s6 codfw master)
* 06:15 marostegui: Rename tables on db1133 (m5 master) nova_api database - [[phab:T248313|T248313]]
* 06:13 marostegui: Remove grants 'nova'@'208.80.154.23' on nova.* - [[phab:T248313|T248313]]


== 2020-03-24 ==
== 2021-07-22 ==
* 20:53 cdanis: repool eqsin
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 20:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Don't hard-set wgTmhUseBetaFeatures to true, let it vary by wiki (duration: 01m 07s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 20:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgTmhUseBetaFeatures to vary by wiki (duration: 01m 06s)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 20:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: Attempt #2: group0 wikis to 1.35.0-wmf.25 refs [[phab:T233873|T233873]]
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 20:32 twentyafterfour@deploy1001: Synchronized wmf-config: Now touch and sync again because of settings cache rache condition. refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:31 cdanis: rebooting cr2-eqsin [[phab:T248394|T248394]]
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 20:30 twentyafterfour@deploy1001: Synchronized wmf-config: Now sync InitializeSettings* refs [[phab:T248409|T248409]] (duration: 00m 59s)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 20:28 twentyafterfour@deploy1001: Synchronized wmf-config/CommonSettings.php: sync CommonSettings before InitialiseSettings refs [[phab:T248409|T248409]] (duration: 00m 58s)
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 20:27 volans: force rebooting analytics1044 from console, host down and unreachable (ping, ssh, console)
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 20:26 cdanis: commit flow-table-size on cr2-eqsin [[phab:T248394|T248394]]
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 20:19 cdanis: eqsin depooled for router maintenance at 16:15
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 19:29 twentyafterfour@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 19:29 twentyafterfour: rolling back to wmf.24 due to high error rate refs [[phab:T233873|T233873]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 19:28 twentyafterfour@deploy1001: scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:49 gehel: repooling wdqs1006, catched up on lag
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 17:12 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]] (duration: 77m 52s)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 17:10 ebernhardson: update cloudelastic-chi replica counts from 2 to 1 [[phab:T231517|T231517]]
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 16:41 moritzm: installing linux-perf updates on stretch
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 16:31 moritzm: installing linux-perf-4.19 updates on buster
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:58 mutante: installing OS on otrs1001.eqiad.wmnet ([[phab:T248028|T248028]])
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 15:55 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.25 and rebuild l10n cache # [[phab:T233873|T233873]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 15:35 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:31 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.22 (duration: 02m 02s)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:29 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.21 (duration: 24m 00s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 15:17 hashar: Cleaning old MediaWiki deployments # [[phab:T233873|T233873]]
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:03 hashar: Applied patches to 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 14:59 hashar: scap prep 1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:55 gehel: depooling wdqs1006 to catch up on lag
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 14:28 marostegui: Deploy schema change on db2117 (s6)
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 14:26 hashar: Branching wmf/1.35.0-wmf.25 # [[phab:T233873|T233873]]
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 13:22 moritzm: installing glib2.0 updates from Stretch point release
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 13:04 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 12:16 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Toroid~huwiki' 'Toroidt' ([[phab:T248371|T248371]])
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 12:10 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Erika Greenberg' 'Copperqueen' ([[phab:T248371|T248371]])
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 11:57 Urbanecm: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Romy merdeka' 'Romy_Dwi_Laksono' ([[phab:T248371|T248371]])
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 11:55 marostegui: Deploy schema change on db2087 db2089 db2097
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 11:34 Urbanecm: EU SWAT done
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 11:29 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]; take II) (duration: 00m 59s)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 00m 59s)
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 11:25 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: SWAT: {{Gerrit|e28c819}}: Enable visualeditor on hewiktionary by default ([[phab:T248311|T248311]]) (duration: 01m 03s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 10:08 gehel: restart blazegraph and updater on wdqs1004
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 09:41 marostegui: Deploy schema change on db2076 (s6)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 08:39 marostegui: Rename nova database tables on db1133 (m5 master) - [[phab:T248313|T248313]]
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 08:25 marostegui: Rename wikidatawiki.wb_terms on db1104 - [[phab:T248086|T248086]]
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 07:33 elukey: restart update-openstack-mirror.service on sodium
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 06:55 marostegui: Reboot dbproxy1018
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 06:42 marostegui: Reboot dbproxy1019
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 06:16 marostegui: Create empty database testreduce on m5 master [[phab:T245408|T245408]]
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1087, vslow s8, with weight 1 as it originally had', diff saved to https://phabricator.wikimedia.org/P10753 and previous config saved to /var/cache/conftool/dbconfig/20200324-060133-marostegui.json
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-03-23 ==
== 2021-07-21 ==
* 21:50 krinkle@deploy1001: Synchronized docroot/noc/css/vector.css: {{Gerrit|I627a0ddba5}} (duration: 01m 02s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 21:39 mholloway-shell@deploy1001: Finished deploy [recommendation-api/deploy@26aa5c3]: Update recommendation-api to {{Gerrit|3141cb6}} (duration: 03m 21s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:45 Urbanecm: Morning SWAT done
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 18:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]; take II) (duration: 00m 59s)
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0e535b1}}: InitialiseSettings - clean up groupOverrides layout / spacing ([[phab:T231178|T231178]]) (duration: 01m 00s)
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]; take II) (duration: 00m 59s)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 18:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6ca1593}}: wgCopyUploadsDomains: Fix supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:32 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: SWAT: {{Gerrit|cbda0e5}}: ApiVisualEditorEdit: Fix handling of minor parameter ([[phab:T248257|T248257]]) (duration: 01m 00s)
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|212114e}}: Dont try to grant `oathauth-enable` to `*` ([[phab:T248282|T248282]]) (duration: 00m 59s)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]], take II) (duration: 00m 59s)
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c12fc2}}: wgCopyUploadsDomains: Add supremecourt.gov ([[phab:T248146|T248146]]) (duration: 01m 00s)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:27 dancy: testing upcoming Scap release on beta
* 18:18 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]; take II) (duration: 00m 59s)
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5eb70ac}}: Add configuration variable $wgRestAPIAdditionalRouteFiles ([[phab:T247997|T247997]]) (duration: 01m 00s)
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:08 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:05 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:31 ema: upload atskafka 0.5 to buster-wikimedia [[phab:T237993|T237993]]
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 15:59 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (take 2) (duration: 00m 59s)
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enablle client side error logging for group0 and hawwike - [[phab:T226986|T226986]] (duration: 01m 00s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 15:32 moritzm: installing maridb-10.1 updates from Stretch point release (client/tools/libraries as packaged by Debian, different from wmf-mariadb)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 15:24 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 15:13 moritzm: installing freetype updates from Stretch point release
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 15:04 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (take 2) (duration: 00m 59s)
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:00 jynus@cumin1001: dbctl commit (dc=all): 'Remove db1089 for special groups (rc)', diff saved to https://phabricator.wikimedia.org/P10749 and previous config saved to /var/cache/conftool/dbconfig/20200323-150046-jynus.json
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:00 otto@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/WikimediaEvents/modules/ext.wikimediaEvents/clientError.js: [[gerrit:578951{{!}}clientError: Changes event fields (T226986)]] (duration: 01m 01s)
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 14:46 jynus@cumin1001: dbctl commit (dc=all): 'Finish doubling db1107 main s1 traffic', diff saved to https://phabricator.wikimedia.org/P10748 and previous config saved to /var/cache/conftool/dbconfig/20200323-144612-jynus.json
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 14:40 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1107 main s1 traffic a 50%', diff saved to https://phabricator.wikimedia.org/P10747 and previous config saved to /var/cache/conftool/dbconfig/20200323-144005-jynus.json
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:35 jynus@cumin1001: dbctl commit (dc=all): 'remove db1107 from special groups', diff saved to https://phabricator.wikimedia.org/P10746 and previous config saved to /var/cache/conftool/dbconfig/20200323-143536-jynus.json
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:28 elukey@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:25 elukey@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:13 elukey@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 13:54 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 13:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Temporarily disable client side error logging for a deploy - [[phab:T226986|T226986]] (duration: 01m 01s)
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 13:33 moritzm: installing python-cryptography updates from Stretch point release
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 12:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 11:41 tgr@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/OAuth/includes/frontend/specialpages/SpecialMWOAuthManageMyGrants.php: SWAT: [[gerrit:582768{{!}}Get consumerKey from consumerId not from acceptanceId (T247531)]] (duration: 01m 01s)
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 11:32 ema: cp1081: restart prometheus-trafficserver-tls-exporter.service
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 11:27 elukey: upload oozie 4.3.0-3 to thirparty/bigtop14 on wikimedia-stretch - [[phab:T244499|T244499]]
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 10:37 jbond42: switch idp1001 to tlsproxy::envoy profile
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 08:07 marostegui: Start m1 and m2 on db1117
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 08:04 marostegui: Stop m1 and m2 on db1117 to transfer them to db1077 - this will trigger dbproxies IRC alert
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 08:03 moritzm: installing python-cryptography bug fix updates from Stretch point release
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 07:46 marostegui: Stop MySQL on db1077 (non used) for 10.4 upgrade and gtid_domain_id on multisource [[phab:T149418|T149418]]
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-03-22 ==
== 2021-07-20 ==
* 23:19 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: [[phab:T248274|T248274]] (duration: 01m 19s)
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 04:37 gehel@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:06 rzl: enabled puppet on A:mw
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-03-20 ==
== 2021-07-19 ==
* 23:16 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 23:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:04 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 20:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 20:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 20:56 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:46 brennen: gerrit1001: restarting gerrit
* 20:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 20:53 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 20:41 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[4-9].eqiad.wmnet
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 20:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 20:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 20:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 20:37 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[0-1].eqiad.wmnet
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 20:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[7-9].eqiad.wmnet
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[4-9].eqiad.wmnet
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[0-1].eqiad.wmnet
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 20:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[7-9].eqiad.wmnet
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 15:44 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/ActorMigration.php: Avoid upsert() log warning spam in ActorMigration due to unique key array format - [[phab:T248147|T248147]] (duration: 01m 01s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 13:34 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 13:33 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 13:33 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10741 and previous config saved to /var/cache/conftool/dbconfig/20200320-121628-marostegui.json
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 11:52 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 11:10 elukey: upload oozie 4.3.0-2 packages to thirdparty/bigtop14 on wikimedia-stretch
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 10:56 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 10:56 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 10:34 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 10:29 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 10:13 dcausse: repooling wdqs1006
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 09:28 moritzm: rolling restart of FPM on mw1261-mw1265 for freetype update
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 08:59 moritzm: installing freetype bugfix updates from stretch point release
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1017', diff saved to https://phabricator.wikimedia.org/P10739 and previous config saved to /var/cache/conftool/dbconfig/20200320-084730-marostegui.json
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10738 and previous config saved to /var/cache/conftool/dbconfig/20200320-083334-marostegui.json
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 07:59 XioNoX: reorder LVS BGP neighbors and add descriptions - https://gerrit.wikimedia.org/r/576320
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10737 and previous config saved to /var/cache/conftool/dbconfig/20200320-074816-marostegui.json
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 07:46 elukey: upload hadoop_2.8.5-2 (and related debs) to thirdparty/bigtop14 on wikimedia-stretch (manually rebuilt via docker after patch backports from upstream)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1017', diff saved to https://phabricator.wikimedia.org/P10736 and previous config saved to /var/cache/conftool/dbconfig/20200320-073205-marostegui.json
* 17:23 volans: running authdns-update to force-update authdns2001
* 07:26 marostegui: Restart mysql on es1017 for upgrade - [[phab:T239791|T239791]]
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1017 for update [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10735 and previous config saved to /var/cache/conftool/dbconfig/20200320-070945-marostegui.json
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1014 to es3 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10734 and previous config saved to /var/cache/conftool/dbconfig/20200320-070922-marostegui.json
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2020-03-19 ==
== 2021-07-16 ==
* 22:15 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}} (duration: 05m 13s)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:10 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@794f099]: Update mobileapps to {{Gerrit|99869f45}}
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:14 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.24
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 18:30 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/ByIdDispatchingEntityInfoBuilder.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part II (duration: 01m 07s)
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 18:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/data-access/src/SingleEntitySourceServices.php: [[gerrit:581674{{!}}Fix 'max' to Int32EntityId::MAX conversion (T247985)]], part I (duration: 01m 08s)
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 17:47 mutante: releases/releases-jenkins - closed firewall hole to port 80 for caching servers - kept it open just for envoy from the backends - ATS speaks https to them meanwhile
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 16:54 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/RelatedArticles: Do not register "" as a style path, that breaks ResourceLoader - [[phab:T248090|T248090]] (duration: 01m 07s)
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 16:01 jeh@deploy1001: Finished deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule (duration: 03m 31s)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 15:57 jeh@deploy1001: Started deploy [horizon/deploy@ad60c2b]: update horizon designate-dashboard submodule
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 15:19 andrew@deploy1001: deploy aborted: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this) (duration: 00m 00s)
* 15:48 vgutierrez: restart pybal on lvs2010
* 14:54 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 14:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 14:48 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 13:32 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.24 (duration: 01m 07s)
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 13:31 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.24
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 13:11 marostegui: Rename testwikidatawiki.wb_terms on db1078 - [[phab:T248086|T248086]]
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 12:33 XioNoX: push frack fw policies [[phab:T248004|T248004]]
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 11:43 Lucas_WMDE: EU SWAT done
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.24/includes/OutputPage.php: SWAT: [[gerrit:581245{{!}}OutputPage: Fix warning when setting wgUserNewMsgRevisionId (T248049)]] (duration: 01m 08s)
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 11:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]; take II) (duration: 01m 08s)
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|e277d29}}: trwiki: Grant interface editors editprotected & editsemiprotected ([[phab:T247672|T247672]]) (duration: 01m 07s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 10:47 ema: upload atskafka 0.4 to buster-wikimedia [[phab:T237993|T237993]]
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 10:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/skin.json: [[gerrit:581248{{!}}skins.vector.styles.legacy needs to define legacy feature (T247566)]] (duration: 01m 08s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 10:01 ema: cp: rolling ats-tls-restart to apply log format changes [[phab:T248067|T248067]] [[phab:T237993|T237993]]
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 09:26 marostegui: m2 maintenance window done [[phab:T246098|T246098]]
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 09:03 akosiaris: restart gerrit on gerrit1001 [[phab:T246098|T246098]]
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:02 akosiaris: restart otrs-daemon, apache on mendelevium [[phab:T246098|T246098]]
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 09:01 akosiaris: restart recommendation-api on scb [[phab:T246098|T246098]]
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 09:00 marostegui: Restart m2 primary database master - [[phab:T246098|T246098]]
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 08:48 dcausse: depooling wdqs1006 to help catching up lag
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 08:43 dcausse: restarting blazegraph on wdqs1006 ([[phab:T242453|T242453]])
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 07:54 moritzm: installing cups updates from Stretch point release
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 07:48 moritzm: installing libjaxen-java security updates from Stretch point release
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 07:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Update pc1008 spare situation [[phab:T247787|T247787]] (duration: 01m 09s)
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 06:49 elukey: execute 'sudo rm /etc/logrotate.d/ceph-common' on cloudvirt-dev and cloudcontrol-dev to stop daily cronspam
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 06:46 marostegui: Deploy schema change on testcommonswiki.globalimagelinks (empty table) on the s4 master [[phab:T243987|T243987]]
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 06:33 marostegui: Upgrade db1132 without restarting [[phab:T246098|T246098]]
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 00:39 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikiws to 1.35.0-wmf.24 refs [[phab:T233872|T233872]]
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 00:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: deploy https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581116 which reverts https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/581054 refs  [[phab:T248010|T248010]] (duration: 01m 07s)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 00:18 eileen: civicrm revision changed from {{Gerrit|a1b2cbeac1}} to {{Gerrit|1c477ff07f}}, config revision is {{Gerrit|37232d8460}}
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)


== 2020-03-18 ==
== 2021-07-15 ==
* 23:31 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.23/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581114/ refs [[phab:T248010|T248010]] (duration: 01m 07s)
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:26 twentyafterfour@deploy1001: Synchronized php-1.35.0-wmf.24/includes/TemplateParser.php: sync https://gerrit.wikimedia.org/r/c/mediawiki/core/+/581115/ (duration: 01m 08s)
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 22:22 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 22:18 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 21:56 Krinkle: krinkle@mw1385: scap pull # clean up AdHoc debugging for [[phab:T248010|T248010]]
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 21:16 brennen@deploy1001: Synchronized php-1.35.0-wmf.24/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 06s)
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 21:11 brennen@deploy1001: Synchronized php-1.35.0-wmf.23/skins/Vector/includes/templates/index.mustache: [[gerrit:581054{{!}}Change master template to force cache invalidation of partials]] (duration: 01m 15s)
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:04 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 19:58 volans@cumin1001: START - Cookbook sre.dns.netbox
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 19:49 hashar@deploy1001: rebuilt and synchronized wikiversions files: Ensure fleet wide consistency
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:21 mutante: shutting down (decom cookbook) elnath.codfw.wmnet ([[phab:T188544|T188544]])
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:15 fdans@deploy1001: Finished deploy [analytics/refinery@549f6a4]: deploying analytics refinery (duration: 15m 02s)
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:11 hashar: 1.35.0-wmf.24 is on hold: too many blockers
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:00 fdans@deploy1001: Started deploy [analytics/refinery@549f6a4]: deploying analytics refinery
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:32 Lucas_WMDE: Morning SWAT done
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 18:30 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 18:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:579018{{!}}Update linter whitelist w/ parsoid11's IP address (T246833)]] (beta-only) (duration: 01m 04s)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:20 Lucas_WMDE: scap pull on mwdebug1001, attempting to fix mismatched wikiversions alert
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:14 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 08s)
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:13 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:13 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]], take II (duration: 01m 07s)
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:11 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:580373{{!}}Add beta configuration for Wikibase reference formatting (T247416)]] (duration: 01m 07s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 16:43 mutante: wtp1025 - Icinga alerted it's running out of disk - 'apt-get clean' lowered disk usage from 97% to 91%
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:00 hashar@deploy1001: Finished scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]] (duration: 61m 23s)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 14:58 hashar@deploy1001: Started scap: testwiki to 1.35.0-wmf.24 and rebuild l10n cache - [[phab:T233872|T233872]]
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:41 vgutierrez: disable TLS session tickets in ulsfo - [[phab:T245616|T245616]] [[phab:T170567|T170567]]
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 14:29 godog: add debug to icinga2001 - [[phab:T247538|T247538]]
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:28 _joe_: restarted php-fpm on mw1283, was throwing SIGILL
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 14:17 marostegui: Rename wb_terms on codfw hosts: s8 (wikidatawiki - db2081), s3 (testwikidatawiki - db2109), s4 (commonswiki, testcommonswiki - db2106)  [[phab:T208425|T208425]]
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:06 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.23
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 11:59 hashar@deploy1001: Synchronized php-1.35.0-wmf.24/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 07s)
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 11:57 hashar@deploy1001: Synchronized php-1.35.0-wmf.23/includes/objectcache/ObjectCache.php: objectcache: Restore keyspace for LocalServerCache service - [[phab:T247562|T247562]] (duration: 01m 10s)
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db1087, vslow host weight in main, given that the CPU across s8 is now doing a lot better', diff saved to https://phabricator.wikimedia.org/P10715 and previous config saved to /var/cache/conftool/dbconfig/20200318-114259-marostegui.json
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 11:17 ema: upload atskafka 0.3 to buster-wikimedia [[phab:T237993|T237993]]
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 11:16 kart_: EU Mid-day SWAT done
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 11:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]], take II (duration: 01m 07s)
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 11:10 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}579893{{!}}Enable ContentTranslation as a default tool in Malay, Azerbaijani and Estonian WPs (T246622, T246628, T246629)]] (duration: 01m 07s)
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 10:58 _joe_: setting num_retries=0 on mw2224 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]], take II (duration: 01m 06s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store (wb_terms table) in wikidata (T208425)]] (duration: 01m 08s)
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:52 _joe_: setting num_retries=0, idle_timeout=5s on mw2223 for eventgate-analytics in envoy ([[phab:T247484|T247484]])
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]], take II (duration: 01m 07s)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Stop writing to old term store in testwikidatawiki (T208425)]] (duration: 01m 07s)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:33 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 10:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 07s)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 10:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]], take II (duration: 01m 07s)
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 10:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Read from the new term store everywhere (T219123)]] (duration: 01m 08s)
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:43 vgutierrez: enabling inbound TLSv1.3 in upload@ulsfo - [[phab:T170567|T170567]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:18 vgutierrez: enabling inbound TLSv1.3 in cp4026 - [[phab:T170567|T170567]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 08:44 marostegui: Start replication pc1008 from pc1010 to get some of the new keys so it is not fully empty - [[phab:T247787|T247787]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 08:14 vgutierrez: upgrade ATS to 8.0.6-1wm3 in ulsfo - [[phab:T170567|T170567]]
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 07:55 moritzm: installing remaining libxslt security updates
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 07:40 oblivian@deploy1001: Synchronized wmf-config/ProductionServices.php: eventgate-analytics to use envoy everywhere (duration: 01m 10s)
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 07:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 06:31 marostegui: Reboot pc1008 to try to get its RAID redone - [[phab:T247787|T247787]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 00:31 Amir1: foreachwikiindblist medium deleteEqualMessages.php --delete ([[phab:T247562|T247562]])
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 00:10 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 02m 29s)
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 00:08 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 00:07 crusnov@deploy1001: Finished deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade (duration: 01m 17s)
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 00:06 crusnov@deploy1001: Started deploy [netbox/deploy@14256f9]: netbox 2.7.10 upgrade
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== 2020-03-17 ==
== 2021-07-14 ==
* 22:49 Amir1: warming up cache for Q80M to Q88M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 22:17 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}} (duration: 06m 08s)
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 22:11 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@0adead4]: Update mobileapps to {{Gerrit|ec6fd6e}}
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 21:54 Krinkle: krinkle@mw2170$ disable-puppet (Testing for [[phab:T99740|T99740]])
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 21:15 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting (again) ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 21:10 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: WikimediaEditorTasks: Enable Depicts counting ([[phab:T247874|T247874]]) (duration: 01m 07s)
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 20:50 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/WikimediaEditorTasks: Fix revert counting for non-language-specific counters, take 2 ([[phab:T244974|T244974]]) (duration: 01m 12s)
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 20:33 mutante: boron - systemctl start docker-reporter-k8s-images ; systemctl start docker-reporter-releng-images
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 20:31 mutante: boron - had degraded systemd state in Icinga - systemctl start docker-reporter-base-images
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 19:54 mutante: miscweb1001 - restarted ferm, reverted live hack
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:53 ppchelko@deploy1001: Finished deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]] (duration: 14m 31s)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 19:51 mutante: miscweb1001 - testing if ferm 80 firewall hole is needed for envoy, temp. disabled puppet, restarted ferm
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:38 ppchelko@deploy1001: Started deploy [restbase/deploy@8db09ed]: Various PCS endpoints additions and fixes [[phab:T247295|T247295]] [[phab:T247096|T247096]] [[phab:T244175|T244175]]
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 19:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]], take II (duration: 01m 06s)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:00 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q80M (T219123)]] (duration: 01m 07s)
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:53 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.24/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580390{{!}}Do not lock rows when there's no term returned (T247553 T246898)]], To catch the train (duration: 01m 08s)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 18:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:45 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 18:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 18:39 mutante: removing mw1238 through mw1243 - decom with cookbook ([[phab:T247780|T247780]] [[phab:T245099|T245099]])
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 18:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 18:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw123[8-9].eqiad.wmnet
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 18:35 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw124[0-3].eqiad.wmnet
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 18:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 18:01 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}} (duration: 06m 06s)
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 18:00 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 17:58 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:56 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/languages/LanguageConverter.php: [[gerrit:580361{{!}}languages: Don't assume  in LanguageConverter (T235360)]] (duration: 01m 07s)
* 15:37 moritzm: installing klibc security updates
* 17:55 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@b6bff94]: Update mobileapps to {{Gerrit|3c73ca3}}
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 17:55 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw124[0-3].eqiad.wmnet
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 17:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw123[89].eqiad.wmnet
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:52 Amir1: warming up cache for Q70M to Q80M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 17:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580352{{!}}Do not lock rows when there's no term returned (T247553 T246898)]] (duration: 01m 07s)
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 17:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:51 moritzm: installing apache security updates on puppet masters
* 17:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 17:37 ejegg: updated payments-wiki from {{Gerrit|86ce0361f9}} to {{Gerrit|72856949a1}}
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 17:30 bearND: mobileapps deploy failed on canary, rolled back
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 17:29 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}} (duration: 04m 00s)
* 14:44 moritzm: installing apache security updates on grafana*
* 17:25 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@266e6da]: Update mobileapps to {{Gerrit|6370784}}
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 17:24 elukey@deploy1001: Finished deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2 (duration: 00m 43s)
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 17:24 elukey@deploy1001: Started deploy [analytics/superset/deploy@3f3ddcb]: Upgrade PyHive to 0.6.2
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 17:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 17:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1280.eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 17:10 jynus: purging some old rows on pc1010 on a screen to earn some time [[phab:T247788|T247788]]
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 16:56 mutante: mw1280 - scap pull - had ancient mw version due to downtime
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 16:46 mutante: mw1280 back after long downtime due to broken RAM, added back into puppet ([[phab:T240187|T240187]])
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:13 elukey: restart php-fpm on mw2370
* 15:56 brennen@deploy1001: rebuilt and synchronized wikiversions files: Reverting All wikis to 1.35.0-wmf.23
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 15:52 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 15:52 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 05m 16s)
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 15:51 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 15:50 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 15:44 brennen@deploy1001: sync-wikiversions aborted: All wikis to 1.35.0-wmf.23 (duration: 03m 49s)
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 15:36 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 15:23 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 15:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:15 mutante: mw1422 - scap pull
* 15:01 hashar: scap prep 1.35.0-wmf.24 and applying security patches # [[phab:T233872|T233872]]
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 15:00 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 14:57 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 14:44 dcausse: wdqs1010 (test server) is running a data-reload cookbook (and is probably taking longer than the expected downtime)
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 14:38 hashar: mediawiki/core git push {{Gerrit|68bc9300dc}}:wmf/1.35.0-wmf.24  to catch up with a change that got merged while branch is being cut # [[phab:T233872|T233872]]
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 14:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]], take II (duration: 01m 04s)
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 14:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q70M (T219123)]] (duration: 01m 10s)
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 14:24 marostegui: Stop mysql and restart pc1008 [[phab:T247787|T247787]]
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 14:23 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 14:21 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseItemTermStoreWriter.php: [[gerrit:580328{{!}}Store item terms at late as possible to avoid deadlocks (T247553 T246898)]] (duration: 01m 07s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 14:13 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 14:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 14:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 14:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 14:07 herron@cumin1001: START - Cookbook sre.hosts.downtime
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 14:06 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 14:03 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 13:41 hashar: Branching 1.35.0-wmf.24 # [[phab:T233872|T233872]]
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 13:30 godog: stop puppet and turn on debug on icinga2001 - [[phab:T247538|T247538]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 12:06 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 12:06 cdanis@cumin1001: START - Cookbook sre.network.cf
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 11:46 godog: test pinning icinga to a subset of cpu on icinga1001
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 11:16 akosiaris: [[phab:T242461|T242461]] undeploy restrouter. Unused service and per task to not  be used after all
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 11:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'restrouter' for release 'staging' .
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 11:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 11:15 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'restrouter' for release 'production' .
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 10:56 XioNoX: add extra prepend to LG export filter
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 10:41 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 10:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 10:40 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 10:40 jbond42: sec update for libgraphicsmagick on maps
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 10:20 godog: bounce squid on install1003 [[phab:T247759|T247759]]
* 00:49 eileen: civicrm revision changed from {{Gerrit|bb62188ec6}} to {{Gerrit|b1c63470bb}}, config revision is {{Gerrit|c291b3c689}}
* 10:07 _joe_: sudo cumin -b2 -s 50 'A:mw-jobrunner' 'restart-php7.2-fpm' [[phab:T247622|T247622]]
* 00:48 eileen: process-control config revision is {{Gerrit|c291b3c689}}
* 10:03 Amir1: warming up cache for Q60M to Q70M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)
* 10:02 ema: create kafka topic atskafka_test_webrequest_text [[phab:T247497|T247497]]
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 09:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]], take II (duration: 01m 05s)
* 09:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q60M (T219123)]] (duration: 01m 09s)
* 09:27 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 09:21 ema: cp: rolling varnish-frontend-restart to decrease memory usage and apply transient storage limits [[phab:T185968|T185968]]
* 09:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 08:39 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 00:57 krinkle@deploy1001: Synchronized php-1.35.0-wmf.23/extensions/Wikibase/lib/includes/Formatters/: {{Gerrit|Ic77b2c6b33a}}, [[phab:T247458|T247458]] (duration: 01m 12s)


== 2020-03-16 ==
== 2021-07-13 ==
* 23:14 tzatziki: reset email for "MNadrofsky (WMF)" on SUL and officewiki
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 08s)
* 20:58 mutante: mw1223 power down
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 07s)
* 20:54 mutante: powercycling mw1223
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 20:52 mutante: 5 old API appservers in eqiad removed
* 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 20:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 20:43 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 20:42 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw122[1-6].eqiad.wmnet
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 20:35 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 20:04 mutante: depool (yes->no) mw1221 - mw1226 ([[phab:T247780|T247780]])
* 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 20:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw122[1-6].eqiad.wmnet
* 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
* 19:28 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}} (duration: 06m 48s)
* 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 19:26 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: [[gerrit:704368{{!}}links is flat array (T286040)]] (duration: 02m 07s)
* 19:24 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
* 19:23 jynus: stop replication at pc1010 at pos pc1007-bin.080617:{{Gerrit|259138670}}
* 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
* 19:21 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@f5600d6]: Update mobileapps to {{Gerrit|8a6e403}}
* 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
* 19:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Pool pc1010 instead of pc1008 as pc1008 is overloaded (duration: 01m 06s)
* 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 18:38 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|I2c3217fb3da8bb65}} (duration: 01m 07s)
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
* 18:36 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: no-op, courtesy of opcache (duration: 01m 06s)
* 17:45 mutante: mw1283 - decom - powered off by cookbook
* 18:34 krinkle@deploy1001: Synchronized docroot/noc/: {{Gerrit|I2c3217fb3}} (duration: 01m 07s)
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
* 18:18 mforns@deploy1001: Finished deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118 (duration: 13m 01s)
* 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - [[phab:T280203|T280203]]"
* 18:05 mforns@deploy1001: Started deploy [analytics/refinery@1681b92]: deploying refinery to add forgotten artifacts for v0.0.118
* 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:08 Amir1: warming up cache for Q50M to Q60M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]], take II (duration: 01m 08s)
* 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q50M (T219123)]] (duration: 01m 06s)
* 17:09 mutante: mw1282 - decom, powered off
* 16:54 gehel: repooling wdqs1005
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 16:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Enforce Content Security Policy if wmgUseCSP is set [[phab:T244124|T244124]] (duration: 01m 06s)
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
* 16:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touch and secondary sync of IS for cache-busting (duration: 01m 07s)
* 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181{{!}}Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)
* 16:48 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wmgUseCSP false everywhere [[phab:T244124|T244124]] (duration: 01m 07s)
* 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:34 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I498e2ebd8c9}} (duration: 01m 07s)
* 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:33 krinkle@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: {{Gerrit|I498e2ebd8c9}} (no-op) (duration: 01m 07s)
* 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:30 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|I870122f946d}} (duration: 01m 07s)
* 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:22 rlazarus: copied envoyproxy_1.13.1-1 from buster-wikimedia to stretch-wikimedia
* 16:55 jbond: upload statograph to buster wikimedia
* 16:21 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I08af45e2e47}} (duration: 01m 07s)
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
* 16:14 krinkle@deploy1001: Synchronized wmf-config/wgConf.php: {{Gerrit|Ie9002d9095ee}} (duration: 01m 08s)
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-recursive_0.0.1-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 15:04 akosiaris: [[phab:T234181|T234181]] upload apertium-anaphora_0.0.4-1+wmf1 to apt.wikimedia.org jessie-wikimedia/main
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 15:02 moritzm: rolling restart of FPM/apache on netmon* to pick up libxslt security updates
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 14:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]], take II (duration: 01m 06s)
* 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 14:22 Amir1: warming up cache for Q40M to Q50M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
* 14:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579925{{!}}Set up read new term store up to Q40M (T219123)]] (duration: 01m 07s)
* 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:16 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up libxslt security updates
* 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:15 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id {{Gerrit|87500000}} --to-id {{Gerrit|87767570}} --batch-size=10 --sleep=5 ([[phab:T219123|T219123]])
* 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
* 14:05 moritzm: installing libxslt security updates
* 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483)
* 13:49 ema: upload atskafka 0.1 to buster-wikimedia [[phab:T237993|T237993]]
* 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 13:42 gehel: restarting blazegraph on wdqs1007
* 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 13:30 gehel: depooling wdqs1005 to catch up on lag
* 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]] (duration: 03m 28s)
* 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1015', diff saved to https://phabricator.wikimedia.org/P10706 and previous config saved to /var/cache/conftool/dbconfig/20200316-124309-marostegui.json
* 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]]
* 12:09 Amir1: warming up cache for Q35M to Q40M for new term store on db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 13:37 effie: rolling restart php-fpm across clusters - [[phab:T286260|T286260]]
* 12:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]], take II (duration: 01m 07s)
* 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176{{!}}Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)
* 12:05 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:579913{{!}}Set up read new term store up to Q35M (T219123)]] (duration: 01m 08s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 11:52 XioNoX: manually fix prometheus squid exporter on install1003
* 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 11:04 Amir1: ... for Q30M-Q35M of the new term store
* 13:14 kormat: restarted replication on db1117:3325 [[phab:T284622|T284622]]
* 11:04 Amir1: Warming up InnoDB buffer pool cache in db1111, db1126, db1104, db1092 ([[phab:T219123|T219123]])
* 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
* 10:55 Amir1: warming up db1026 for up to Q35M for the new term store ([[phab:T219123|T219123]])
* 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10705 and previous config saved to /var/cache/conftool/dbconfig/20200316-104723-marostegui.json
* 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
* 10:45 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]), take II (duration: 01m 07s)
* 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 10:43 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: "Set term store to WRITE_BOTH for all of Wikidata" ([[phab:T219123|T219123]]) (duration: 01m 13s)
* 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10704 and previous config saved to /var/cache/conftool/dbconfig/20200316-104002-marostegui.json
* 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 10:36 elukey: roll restart of recommendation service on scb* as attempt to fix the flapping alerts - [[phab:T247732|T247732]]
* 12:53 kormat: stopping replication on db1117:3325 [[phab:T284622|T284622]]
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10703 and previous config saved to /var/cache/conftool/dbconfig/20200316-102829-marostegui.json
* 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1015', diff saved to https://phabricator.wikimedia.org/P10702 and previous config saved to /var/cache/conftool/dbconfig/20200316-101707-marostegui.json
* 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 10:10 marostegui: Stop mysql for upgrade on es1015 [[phab:T239791|T239791]]
* 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 10:02 Amir1: start of ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=0 --file=15march2217-holes-nulls.list on screen ([[phab:T219123|T219123]])
* 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - [[phab:T280203|T280203]]
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1015 for upgrade and restart [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10701 and previous config saved to /var/cache/conftool/dbconfig/20200316-093228-marostegui.json
* 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1011 to es2 master, this is a NOOP [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10700 and previous config saved to /var/cache/conftool/dbconfig/20200316-093048-marostegui.json
* 12:20 mutante: mwmaint1002 - scap pull after reimaging
* 08:16 marostegui: Review and enable events on recently migrated 10.4 hosts - [[phab:T247728|T247728]]
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 08:02 ema: cp4025 restart trafficserver-tls to clear 'tls process restarted' alert [[phab:T241593|T241593]] [[phab:T185968|T185968]]
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 07:57 moritzm: installing libxslt security updates
* 11:28 Lucas_WMDE: EU backport+config window done
* 07:52 ema: cp4025: restart varnish-fe to clear 'child restarted' alert [[phab:T185968|T185968]]
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:704304{{!}}Remove obsolete $wgShowDBErrorBacktrace config]] (duration: 01m 25s)
* 07:47 moritzm: installing lxml security updates
* 11:13 mutante: mwmaint1002 - reimaging with buster ([[phab:T267607|T267607]])
* 07:14 moritzm: installing libgd2 security updates on jessie
* 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed ([[phab:T267607|T267607]])
* 06:54 moritzm: removing some library packages from jessie/stretch after labstore1006/1007 dist-upgrade to buster
* 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
* 06:38 _joe_: restart envoy with 10 requests per connection on mw2231, [[phab:T247484|T247484]]
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan: running `nodetool decommission` on maps2008
* 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:18 moritzm: installing apache security updates on Logstash hosts
* 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
* 09:40 moritzm: installing apache security updates on thanos-fe hosts
* 09:38 moritzm: installing apache security updates on parsoid hosts
* 09:31 effie: depool mw2383 [[phab:T286463|T286463]]
* 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:45 effie: depool mw2383 - [[phab:T286463|T286463]]
* 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
* 07:06 moritzm: installing apache security updates on codfw mw* hosts
* 06:53 elukey: systemc