You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(eileen: process control updated to c291b3c6890364281d)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 2021-08-03 ==
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)
== 2021-08-02 ==
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
== 2021-07-31 ==
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
== 2021-07-30 ==
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
== 2021-07-29 ==
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
== 2021-07-28 ==
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php
== 2021-07-27 ==
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json
== 2021-07-26 ==
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 06:39 moritzm: installing krb5 security updates
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
== 2021-07-24 ==
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php  --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
== 2021-07-23 ==
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:15 effie: enable puppet on mc-gp* hosts
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
== 2021-07-22 ==
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
== 2021-07-21 ==
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
== 2021-07-20 ==
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:06 rzl: enabled puppet on A:mw
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}
== 2021-07-19 ==
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 18:46 brennen: gerrit1001: restarting gerrit
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure
== 2021-07-16 ==
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 15:48 vgutierrez: restart pybal on lvs2010
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
== 2021-07-15 ==
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.
== 2021-07-14 ==
== 2021-07-14 ==
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:37 moritzm: installing klibc security updates
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 14:51 moritzm: installing apache security updates on puppet masters
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:44 moritzm: installing apache security updates on grafana*
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:13 elukey: restart php-fpm on mw2370
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 12:15 mutante: mw1422 - scap pull
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}

Revision as of 23:34, 3 August 2021

2021-08-03

  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable commonswiki sister search (T277225) (duration: 01m 07s)
  • 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for T287988 (T281158)
  • 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 3/3) (duration: 01m 07s)
  • 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 07s)
  • 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 00m 37s)
  • 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 00m 37s)
  • 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 08s)
  • 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
  • 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
  • 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
  • 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
  • 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
  • 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 ryankemper: T285355 `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
  • 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
  • 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
  • 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
  • 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
  • 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
  • 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer (T286853)
  • 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
  • 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization (duration: 00m 48s)
  • 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization
  • 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
  • 16:59 hashar: Gerrit has been upgraded
  • 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
  • 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
  • 16:45 urbanecm: Start server side upload for 1 video file (T287957)
  • 16:45 hashar: Stopping Gerrit for upgrade
  • 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
  • 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
  • 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
  • 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
  • 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
  • 12:47 moritzm: restarting Tomcat on idp1001
  • 12:05 moritzm: installing libgcrypt20 security updates
  • 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 11:36 moritzm: updated bullseye d-i images to rc3 T275873
  • 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:13 moritzm: rename Ganeti group for test cluster to row_D T286206
  • 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 09:18 marostegui: Failover m1, m2 and m3-master T287574
  • 09:12 moritzm: installinh php 7.0 security updates on stretch
  • 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054
  • 08:57 moritzm: installing pillow security updates on stretch
  • 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
  • 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
  • 06:31 kart__: Updated cxserver to 2021-08-02-164000-production (T286473)
  • 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
  • 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
  • 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
  • 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
  • 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
  • 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
  • 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)

2021-08-02

  • 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:21 legoktm: Previous sync also deployed c38998f03f "Stop enabling DPL on new wikis" (T287380)
  • 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
  • 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
  • 21:31 tzatziki: removing 1 file for legal compliance
  • 21:16 tzatziki: removing 7 files for legal compliance
  • 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287868, T287874, T287873)
  • 19:00 urbanecm: Morning B&C window completed
  • 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 2/2) (duration: 00m 56s)
  • 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 1/2) (duration: 00m 57s)
  • 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - T287652 (duration: 00m 56s)
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 2/2) (duration: 00m 56s)
  • 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 1/2) (duration: 00m 56s)
  • 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ee47f9d: Add rollbacker group for kswiki (T286789) (duration: 00m 56s)
  • 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eec997c: Enable SUL autologin for wikimania.wikimedia.org (T285197) (duration: 00m 55s)
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 2/2) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 1/2) (duration: 00m 57s)
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cc8ca45: Add tewikisource as import source for tewikibooks (T286978) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11e96ba: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287264) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: 97b6897: Remove unused enwiki celebration logos (T272108) (duration: 00m 57s)
  • 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 16f9794: Remove unused eswiki celebration logos (T280908) (duration: 00m 57s)
  • 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 15:44 jynus: remove s2 from db1139 T287230
  • 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:02 mutante: gerrit1001 - restarting service after 706049
  • 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
  • 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
  • 12:20 mutante: gerrit servers: disabling puppet
  • 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: T287528 (duration: 00m 57s)
  • 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287780 (duration: 00m 57s)
  • 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
  • 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287782 (duration: 00m 56s)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
  • 11:29 hashar: restarting gerrit primary server on gerrit1001
  • 11:27 hashar: restarting Jenkins on contint2001
  • 11:27 hashar: restarting Jenkins on contint1001
  • 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
  • 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 urbanecm: EU B&C window completed
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43020b7: votewiki: Enable Single Transferable Vote (T283728) (duration: 00m 57s)
  • 11:08 moritzm: installing openjdk-11 security updates
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26bcaaf: Restore logging for mediamoderation script to better understand high error rate occurring when running script (T287511) (duration: 00m 57s)
  • 07:53 moritzm: catch up bullseye installs with latest state of testing
  • 07:24 moritzm: installing libsndfile security updates on buster
  • 07:12 moritzm: installing aspell security updates
  • 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)

2021-07-31

2021-07-30

  • 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:20 eileen: civicrm revision is 158ed65e00, config revision is 6011d9c471
  • 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
  • 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - T287760 (duration: 00m 57s)
  • 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
  • 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
  • 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
  • 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
  • 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
  • 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
  • 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
  • 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
  • 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:26 joe: uploaded docker-report 0.0.13 to buster
  • 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
  • 11:23 moritzm: installing libsndfile security updates on stretch
  • 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
  • 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
  • 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
  • 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
  • 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. T284592
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
  • 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
  • 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json

2021-07-29

  • 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Merge new configs with existing testwiki definition (duration: 00m 57s)
  • 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16 refs T281157
  • 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704) (duration: 01m 09s)
  • 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15 refs T281157
  • 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16 refs T281157
  • 18:37 urbanecm@deploy1002: Finished scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585) (duration: 17m 06s)
  • 18:19 urbanecm@deploy1002: Started scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585)
  • 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: 9a2383d: Display: Use HTML "dir" attribute for ltr/rtl (T287649) (duration: 01m 25s)
  • 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
  • 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:11 mmandere: pool lvs1013.eqiad.wmnet - T286032
  • 15:09 mmandere: pool dns1001.wikimedia.org - T286032
  • 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - T286032
  • 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:46 mmandere: depool lvs1013 - T286032
  • 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:39 mmandere: depool dns1001 - T286032
  • 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - T286032
  • 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:11 vgutierrez: restart pybal on lvs2009
  • 14:09 vgutierrez: restart pybal on lvs2010
  • 14:07 vgutierrez: restart pybal on lvs2008
  • 14:05 vgutierrez: restart pybal on lvs2007
  • 13:59 vgutierrez: restart pybal on lvs1014
  • 13:55 vgutierrez: restart pybal on lvs1015
  • 13:52 _joe_: restarting pybal on lvs1016
  • 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
  • 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
  • 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T287230', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
  • 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
  • 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
  • 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
  • 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
  • 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:52 moritzm: restarting Tomcat on idp-test
  • 06:41 XioNoX: push pfw policies - T287203
  • 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
  • 01:08 eileen: civicrm revision changed from 739c936298 to 158ed65e00, config revision is 6011d9c471

2021-07-28

  • 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgSkipSkins: Update defaults, hide modern (T287616) (duration: 01m 06s)
  • 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: Disable mobile contributions simplifications on Wikidata and Commons (T283988) (duration: 01m 58s)
  • 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16 refs T281157 (duration: 01m 06s)
  • 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16 refs T281157
  • 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
  • 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
  • 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
  • 18:14 jbond: manually cleared out the puppetdb2002 queue
  • 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
  • 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:58 ryankemper: T287112 [WDQS] Re-pooled `wdqs2002`
  • 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
  • 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing (T279309)
  • 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
  • 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
  • 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
  • 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
  • 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
  • 13:29 moritzm: installing python2.7 security updates on stretch
  • 13:08 moritzm: installing python3.5 security updates on stretch
  • 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:27 moritzm: installing nginx security updates on thumbor*
  • 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
  • 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:11 moritzm: installing remaining nginx security updates on stretch
  • 10:09 godog: temp fix prometheus-icinga-am on alert1001
  • 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:40 urbanecm: Start server-side upload for 1 video file (T287482)
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:27 Amir1: running several long-running queries against pc1007
  • 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 07:53 moritzm: installing aspell security updates on stretch
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: T287559
  • 07:07 godog: remove cloud*/syslog.log from centrallog2001 - T287559
  • 07:06 godog: remove node_pinger.prom from node-pinger hosts
  • 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
  • 02:43 TimStarling: on mwmaint2002 fixing T286273 broken files using eval.php

2021-07-27

  • 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: Restore print, links, table and message box styles (T278896) (duration: 01m 07s)
  • 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable user links on office + test wikis (T287391) (duration: 02m 00s)
  • 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
  • 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
  • 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
  • 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
  • 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
  • 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
  • 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - T287210 (duration: 02m 28s)
  • 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
  • 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
  • 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
  • 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
  • 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
  • 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
  • 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
  • 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - T287238
  • 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) T147505
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 15:17 mmandere: pool lvs1014.eqiad.wmnet - T286061
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 T286061
  • 15:11 mmandere: pool authdns1001.wikimedia.org - T286061
  • 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - T286061
  • 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
  • 14:53 moritzm: disabling puppet for upcoming row B maintenance
  • 14:52 mmandere: depool lvs1014 - T286061
  • 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:40 mmandere: depool authdns1001 - T286061
  • 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - T286061
  • 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
  • 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - T287238
  • 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T287230', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
  • 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:11 moritzm: installing aspell security updates
  • 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
  • 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
  • 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:30 ottomata: deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
  • 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
  • 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
  • 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
  • 11:23 Lucas_WMDE: EU backport+config window done
  • 11:20 oblivian@deploy1002: Synchronized debug.json: Config: Add the experimental kubernetes backend to mwdebug (T283056) (duration: 00m 56s)
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add stream configuration for ContentTranslation events (T281982) (duration: 00m 58s)
  • 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
  • 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
  • 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
  • 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
  • 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
  • 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
  • 09:52 jynus: reverting query killer parameters on s3 codfw replicas
  • 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
  • 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
  • 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
  • 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
  • 08:57 _joe_: repooling mw225[12] for apis
  • 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
  • 08:36 jynus: reenabled puppet on mwmaint1002
  • 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
  • 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
  • 07:52 jynus: disabling puppet on mwmaint1002
  • 07:14 moritzm: installing krb security updates on buster
  • 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - T287238
  • 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part II (duration: 00m 56s)
  • 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part I (duration: 00m 57s)
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json

2021-07-26

  • 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
  • 18:30 cstone: SmashPig revision changed from be272c02ce to 020d4eccd4,
  • 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - T287394
  • 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
  • 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. T287394
  • 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
  • 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # T287122
  • 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: Don’t generate current content text twice, Part II (duration: 01m 49s)
  • 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: Don’t generate current content text twice, Part I (duration: 01m 50s)
  • 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
  • 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
  • 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
  • 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:15 XioNoX: rollback sampling for T286038
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
  • 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 07:18 _joe_: docker-image prune on deneb T287222
  • 07:17 _joe_: manage-production-images prune on deneb, T287222
  • 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
  • 06:39 moritzm: installing krb5 security updates
  • 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki

2021-07-24

  • 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280397' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # T287321

2021-07-23

  • 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - T287110
  • 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw T287110
  • 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
  • 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 16:15 effie: enable puppet on mc-gp* hosts
  • 15:47 papaul: powerdown wdqs2002 for IDRAC reset
  • 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - T287238
  • 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging T285384
  • 14:16 brennen: gitlab1001: running ansible to deploy fix puma exporter listen address (T275170)
  • 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232 (duration: 03m 32s)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232
  • 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - T287244
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
  • 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
  • 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
  • 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
  • 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
  • 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
  • 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
  • 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
  • 03:11 ryankemper: T287223 Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
  • 03:09 ryankemper: T287223 Installed `nginx-light` on all of `elastic1*` (eqiad)
  • 03:06 ryankemper: T287223 Installed `nginx-light` on all of `elastic2*` (codfw)
  • 02:53 ejegg: updated Fundraising CiviCRM from 819c11307d to 739c936298
  • 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
  • 01:28 ejegg: updated payments-wiki from 844b59ee42 to cc5d14ea7f
  • 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # T287222

2021-07-22

  • 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: Make sure enable responsive mode UI reflects actual preference value (T285402) (duration: 00m 56s)
  • 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - T282855 T238138 T282562 T271168 (duration: 00m 55s)
  • 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
  • 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
  • 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
  • 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
  • 19:00 urbanecm: Start server-side upload for 1 video file (T287061)
  • 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232 (duration: 03m 22s)
  • 18:58 urbanecm: Start server-side upload for 1 video file (T286489)
  • 18:56 urbanecm: Start server-side upload for 1 video file (T286665)
  • 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232
  • 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # T286500
  • 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26c23de: hewikisource: Add namespace aliases (T286500) (duration: 00m 55s)
  • 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 599c220: enwikisource: Create upload-shared user group (T285130) (duration: 00m 56s)
  • 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232 (duration: 03m 18s)
  • 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232
  • 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6a90930: Enable the visual editor on the 2021 namespace on Wikimania wiki (T287197) (duration: 00m 55s)
  • 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f765832: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287204) (duration: 00m 55s)
  • 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:10 legoktm: testing dc switchover warmup script in eqiad
  • 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
  • 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
  • 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
  • 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
  • 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
  • 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
  • 16:56 brennen: gitlab1001: running ansible to deploy gerrit:706396 (T275170)
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
  • 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
  • 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
  • 15:45 marostegui: Stop db2091 for onsite maintenance
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
  • 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
  • 15:14 mmandere: pool lvs1015 - T286065
  • 15:14 jynus: shutdown db2097 for hw servicing T287072
  • 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
  • 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - T286065
  • 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
  • 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:47 mmandere: depool lvs1015 - T286065
  • 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - T286065
  • 14:29 effie: restarting pybal in lvs2009 and lvs1015
  • 14:27 moritzm: installing libwebp security updates on stretch
  • 14:25 effie: restarting pybal in lvs2010 and lvs1016
  • 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0208fc2: Growth: Add mentor dashboard related config (T278920) (duration: 00m 55s)
  • 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
  • 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
  • 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
  • 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 11:36 Lucas_WMDE: EU backport+config window done
  • 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: Avoid using MWHttpRequest::factory() (2/2) (duration: 01m 04s)
  • 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: Avoid using MWHttpRequest::factory() (1/2) (duration: 01m 04s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: Avoid using WikiPage::factory() (duration: 01m 06s)
  • 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
  • 10:45 effie: restart pybal on lvs2009 and lvs1015
  • 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
  • 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:42 effie: restart pybal on lvs2010 and lvs1016
  • 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
  • 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
  • 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - T287110
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - T287110
  • 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
  • 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump T286888', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
  • 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
  • 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
  • 05:31 ryankemper: T281327 [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
  • 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
  • 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE

2021-07-21

  • 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis (T257066) (duration: 01m 03s)
  • 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:41 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:27 dancy: testing upcoming Scap release on beta
  • 18:27 ryankemper: T281327 [Elastic] `sudo -i wmf-auto-reimage-host -p T281327 elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
  • 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
  • 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
  • 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
  • 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: 1453831: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: aca510b: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 05s)
  • 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # T285811
  • 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085) (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
  • 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
  • 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
  • 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
  • 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 15:17 moritzm: installing intel-microcode security updates on stretch
  • 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
  • 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for T286679 (duration: 04m 45s)
  • 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for T286679
  • 14:40 papaul: powerdown ms-be2038 for BBU replacement
  • 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197 (duration: 00m 09s)
  • 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197
  • 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
  • 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
  • 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 T287036
  • 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
  • 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
  • 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
  • 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
  • 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
  • 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
  • 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
  • 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
  • 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
  • 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
  • 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
  • 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d6699da: GrowthExperiments: Add more wikis to linkrecommendation experiment (T284481) (duration: 01m 31s)
  • 10:50 moritzm: installing systemd security updates on bullseye
  • 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 10:14 effie: enable puppet on mw* servers
  • 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - T287038
  • 09:34 jynus: restart db2097 T287072
  • 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
  • 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156 (duration: 45m 51s)
  • 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
  • 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 08:31 godog: upgrade karma on alert hosts - T284213
  • 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:17 effie: enable puppet on alert*
  • 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156
  • 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
  • 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
  • 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:56 XioNoX: push extra sampling on cr2-eqiad - T286038
  • 07:44 XioNoX: push extra sampling on cr1-eqiad - T286038
  • 07:38 XioNoX: update RIS peer IP on cr2-codfw
  • 07:16 godog: powercycle ms-be2048
  • 07:03 moritzm: installing systemd security updates on stretch
  • 06:51 effie: restart memcached on eqiad mc* hosts
  • 06:51 effie: enable puppet on mc* hosts
  • 06:35 effie: disable puppet on mc1* hosts and icinga - T271967
  • 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-20

  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: caa5a07: Set wgGEMentorDashboardBackendEnabled properly (T285811) (duration: 00m 57s)
  • 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: dafd953: updateMenteeData: Make it possible to disable script per-wiki (T285811) (duration: 00m 58s)
  • 18:57 urbanecm: Start server-side upload for 4 large PNG files (T285708)
  • 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
  • 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:06 rzl: enabled puppet on A:mw
  • 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
  • 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
  • 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
  • 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
  • 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
  • 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:23 vgutierrez: pool dns1002 - T286069
  • 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - T286069
  • 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
  • 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 14:53 urbanecm: Start server-side upload for 7 large PNG files (T285708)
  • 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
  • 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:46 vgutierrez: depool dns1002 - T286069
  • 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - T286069
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
  • 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
  • 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10|0[1-9]).codfw.wmnet
  • 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - T285643
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 12:44 moritzm: installing systemd security updates on buster
  • 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
  • 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
  • 11:58 Lucas_WMDE: EU config+backport window done
  • 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Avoid using User::newFrom* methods (3/3) (duration: 00m 56s)
  • 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Avoid using User::newFrom* methods (2/3) (duration: 00m 56s)
  • 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: Avoid using User::newFrom* methods (1/3) (duration: 00m 56s)
  • 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 3/3) (duration: 00m 56s)
  • 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 2/3) (duration: 00m 56s)
  • 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 1/3) (duration: 00m 57s)
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add patroller group for ckbwiki (T285221) (duration: 00m 57s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Typo fix: "the the" -> "the" (T201491) (2/2, beta) (duration: 00m 56s)
  • 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Typo fix: "the the" -> "the" (T201491) (1/2, prod) (duration: 00m 57s)
  • 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update config for language switching on pilot wikis (T286459) (duration: 00m 59s)
  • 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
  • 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
  • 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
  • 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
  • 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
  • 03:17 eileen: civicrm revision changed from 20e9ef6bbb to 819c11307d, config revision is bb405c5232

2021-07-19

  • 20:48 urbanecm: Deploy security patch for T286884
  • 20:29 vgutierrez: pool text@codfw - T286921
  • 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877) (duration: 00m 58s)
  • 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
  • 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: Add sanity check to newRevisionFromRowAndSlots. (T286877) (duration: 00m 57s)
  • 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - T286921
  • 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - T286921
  • 18:46 brennen: gerrit1001: restarting gerrit
  • 18:40 vgutierrez: stop pybal on lvs2009 - T286921
  • 18:38 brennen: re-enabling puppet on gerrit1001]
  • 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - T286921
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P{relforge*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P{cloudelastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - T286921
  • 18:20 vgutierrez: enabling pybal on lvs2007 - T286921
  • 18:19 ryankemper: T264053 Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P{elastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
  • 18:06 dancy@deploy1002: Synchronized .pipeline: Config: pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately (duration: 00m 56s)
  • 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
  • 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
  • 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
  • 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
  • 17:30 volans: running puppet on elastic2038 after nework was restored
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
  • 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:23 volans: running authdns-update to force-update authdns2001
  • 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 XioNoX: remove ns1 redirect - T286787
  • 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
  • 17:10 XioNoX: enable asw-a2-codfw access ports - T286787
  • 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - T286787
  • 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
  • 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
  • 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
  • 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
  • 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
  • 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
  • 16:40 XioNoX: update asw-a2-codfw serial number - T286787
  • 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
  • 16:21 mutante: depooled logstash2021 for dcops maintenance work
  • 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
  • 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
  • 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
  • 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 310be45f7 (duration: 00m 57s)
  • 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
  • 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
  • 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I2bdfbd258e (duration: 00m 57s)
  • 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I069c7b53 (duration: 00m 58s)
  • 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
  • 15:10 godog: +100G to prometheus/ops in codfw
  • 14:59 vgutierrez: rolling restart of eqiad pybal instances
  • 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:42 vgutierrez: rolling restart of codfw pybal instances
  • 14:33 vgutierrez: rolling restart of eqsin pybal instances
  • 14:23 vgutierrez: rolling restart of ulsfo pybal instances
  • 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
  • 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
  • 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
  • 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
  • 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
  • 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
  • 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
  • 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
  • 11:40 moritzm: installing bluez security updates
  • 11:31 Lucas_WMDE: EU backport+config window done
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add config for updated PropertySuggester beta cluster (T285098) (beta-only) (duration: 00m 57s)
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 09:52 moritzm: imported megacli for bullseye-wikimedia T282272 T275873
  • 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
  • 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
  • 08:15 vgutierrez: depool codfw text traffic
  • 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
  • 03:26 twentyafterfour: restarted phd on phab1001
  • 03:25 twentyafterfour: investigating PHD failure

2021-07-16

  • 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
  • 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
  • 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P{elastic2*}' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
  • 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
  • 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
  • 15:48 vgutierrez: restart pybal on lvs2010
  • 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 15:24 godog: downtime flappy pages in codfw for 40 minutes
  • 15:14 godog: set alert2001 as active in netbox (was staged) - T247966
  • 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
  • 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
  • 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw T286787
  • 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
  • 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
  • 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
  • 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
  • 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
  • 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers (T279309)
  • 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:39 mutante: mw1412 through mw1428 - set to active in netbox (T279309)
  • 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
  • 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
  • 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
  • 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
  • 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
  • 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
  • 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
  • 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
  • 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
  • 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures T286763', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
  • 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied T132839 workarounds)

2021-07-15

  • 23:32 brennen: checking stashbot: T286756
  • 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: Fix creation of mw.Message objects (T286385) (duration: 00m 57s)
  • 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # T285811
  • 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # T285811
  • 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # T285811
  • 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
  • 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki T284928
  • 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
  • 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
  • 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 06s)
  • 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 07s)
  • 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
  • 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
  • 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
  • 16:40 ejegg: updated payments-wiki from d9892207c1 to 844b59ee42
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 16:27 ejegg: updated fundraising CiviCRM from e0d53c92b5 to 20e9ef6bbb
  • 16:24 ejegg: updated payments-wiki from 0e7800027a to 844b59ee42
  • 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Allow admins of idwiki to change stablesettings (T268317), try II (duration: 01m 05s)
  • 15:03 Amir1: temporary becoming admin on idwiki to debug T268317
  • 15:02 moritzm: installing nginx security updates on ms-fe*
  • 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
  • 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
  • 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
  • 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
  • 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
  • 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
  • 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
  • 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
  • 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
  • 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
  • 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
  • 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
  • 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
  • 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
  • 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
  • 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
  • 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make idwiki use protect mode of flaggedrevs (T268317) (duration: 01m 07s)
  • 11:40 moritzm: restarting Etherpad to pick up libuv security update
  • 11:37 moritzm: restarting Turnilo to pick up libuv security update
  • 11:34 moritzm: installing libuv1 security updates
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - T285835
  • 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - T272128
  • 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:02 effie: disableing puppet on maps* for 704394
  • 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
  • 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
  • 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
  • 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 07:48 moritzm: updated bullseye d-i image for latest daily build T275873
  • 07:31 godog: reimage thanos-fe2001 with bullseye - T285835
  • 07:23 elukey: restart planet-update-en.service on planet1002
  • 07:17 elukey: remove /etc/rawdog/en/{state,state.lock} on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
  • 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
  • 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 06s)
  • 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 07s)
  • 05:50 kart_: Updated cxserver to 2021-07-14-124232-production (T282369, T284450)
  • 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 00:00 twentyafterfour: phabricator update deployed.

2021-07-14

  • 23:23 eileen: civicrm revision changed from b1c63470bb to e0d53c92b5, config revision is bb405c5232
  • 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
  • 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
  • 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
  • 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: Fix deprecated offset() on invalid DOM (T185629) (duration: 01m 07s)
  • 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
  • 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
  • 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki T284456
  • 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
  • 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource T284390
  • 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 06s)
  • 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 05s)
  • 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: TranslationAid: Handle empty message definition (T285830) and TranslationAid: Make sure to return successfully fetched definitions (T285830) (duration: 01m 09s)
  • 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:37 moritzm: installing klibc security updates
  • 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
  • 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P{elastic*}' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
  • 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
  • 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:28 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
  • 14:51 moritzm: installing apache security updates on puppet masters
  • 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
  • 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - T286463
  • 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:44 moritzm: installing apache security updates on grafana*
  • 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
  • 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
  • 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
  • 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:13 elukey: restart php-fpm on mw2370
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
  • 12:43 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:15 mutante: mw1422 - scap pull
  • 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
  • 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 11:52 mutante: mw1422 - new setup, not in prod yet
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: Remove reviewer user group in ruwiki (T284589) (duration: 01m 05s)
  • 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Reduce levels for ruwiki to 1 (T284589) (duration: 01m 05s)
  • 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 72027e1: Disable indexing in NS_USER and NS_USER_TALK on bnwiki (T286152) (duration: 02m 07s)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4dc11d2: Change category name of Babel extension on Javanese Wikipedia (T286165) (duration: 02m 10s)
  • 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # T285811
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 00:58 eileen: process control updated to c291b3c
  • 00:58 eileen: c291b3c
  • 00:49 eileen: civicrm revision changed from bb62188ec6 to b1c63470bb, config revision is c291b3c689
  • 00:48 eileen: process-control config revision is c291b3c689
  • 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)

2021-07-13

  • 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 08s)
  • 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 07s)
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
  • 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: links is flat array (T286040) (duration: 02m 07s)
  • 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
  • 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
  • 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
  • 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
  • 17:45 mutante: mw1283 - decom - powered off by cookbook
  • 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
  • 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - T280203"
  • 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:09 mutante: mw1282 - decom, powered off
  • 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
  • 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: Do not lock user_preferences before updating (T286521) (duration: 01m 58s)
  • 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:55 jbond: upload statograph to buster wikimedia
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
  • 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
  • 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
  • 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
  • 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483)
  • 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232 (duration: 03m 28s)
  • 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232
  • 13:37 effie: rolling restart php-fpm across clusters - T286260
  • 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260) (duration: 00m 58s)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:14 kormat: restarted replication on db1117:3325 T284622
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:53 kormat: stopping replication on db1117:3325 T284622
  • 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - T280203
  • 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
  • 12:20 mutante: mwmaint1002 - scap pull after reimaging
  • 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:28 Lucas_WMDE: EU backport+config window done
  • 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove obsolete $wgShowDBErrorBacktrace config (duration: 01m 25s)
  • 11:13 mutante: mwmaint1002 - reimaging with buster (T267607)
  • 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed (T267607)
  • 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan: running `nodetool decommission` on maps2008
  • 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:18 moritzm: installing apache security updates on Logstash hosts
  • 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
  • 09:40 moritzm: installing apache security updates on thanos-fe hosts
  • 09:38 moritzm: installing apache security updates on parsoid hosts
  • 09:31 effie: depool mw2383 T286463
  • 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:45 effie: depool mw2383 - T286463
  • 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
  • 07:06 moritzm: installing apache security updates on codfw mw* hosts
  • 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - T273026
  • 06:06 effie: pool mw2383 - T286463
  • 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
  • 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
  • 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
  • 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`

2021-07-12

  • 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1896efc: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T286163) (duration: 00m 56s)
  • 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=T286396 # T286396
  • 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php (T286396)
  • 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 284216a: Add few namespace aliases for Serbian Wikipedia (T286396) (duration: 00m 56s)
  • 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8a79bf7: enwiki: Delete Book namespace (T285766) (duration: 00m 57s)
  • 23:29 urbanecm@deploy1002: Synchronized static/images/: d007b9c: Remove unused celebration logos and wordmark (T286380) (duration: 00m 57s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6c58149: Add editautoreviewprotected to bot on hewikisource (T275076) (duration: 00m 57s)
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 40eade4: Enable RelatedArticles Extension in zhwikinews (T266933) (duration: 00m 57s)
  • 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # T286101, P16817
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ab00d1: zhwiktionary: Add templateeditor right (T286101) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5822b2b: zhwiktionary: Add aliases for namespaces (T286101) (duration: 00m 57s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba0967f: zhwiktionary: Add Reconstruction namespace (T286101) (duration: 00m 57s)
  • 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
  • 21:26 urbanecm: Start server-side upload for 2 video files (T286432, T286433)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232 (duration: 03m 39s)
  • 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232
  • 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki (T257066) (duration: 00m 58s)
  • 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392 (duration: 21m 24s)
  • 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392
  • 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
  • 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
  • 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
  • 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
  • 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
  • 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232 (duration: 03m 30s)
  • 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232
  • 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232 (duration: 03m 16s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232
  • 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232 (duration: 03m 37s)
  • 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232
  • 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - T282484
  • 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
  • 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable template search improvements on first wikis 2/2 (T284553) (duration: 00m 57s)
  • 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable template search improvements on first wikis 1/2 (T284553) (duration: 00m 56s)
  • 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: Always add 1 prefixsearch match when searching for templates (duration: 00m 57s)
  • 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
  • 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
  • 11:40 moritzm: installing apache updates on mw1/eqiad hosts
  • 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
  • 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 773c956: Revert "Use ptwiki 20th anniversary logos" (T286380) (duration: 00m 57s)
  • 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
  • 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cd5f537: Revert "ptwiki: Use celebration logos in new vector" (T286380) (duration: 00m 57s)
  • 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add 'editautoreviewprotected' protection level to hewikisource (T275076) (duration: 00m 57s)
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
  • 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable transclusion back button on first wikis (T284553) (duration: 00m 58s)
  • 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
  • 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
  • 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
  • 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for T285927
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - T285251
  • 10:03 effie: depool mw2383
  • 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 10:01 godog: test thanos-compact upload with smaller part size - T285835
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
  • 09:07 godog: repool thanos-fe2002 - T285835
  • 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - T285835
  • 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: Remove subscribing to other aspect for entity usage (T286193) (duration: 00m 59s)
  • 07:44 jynus: restart db1102:x1 mariadb instance
  • 07:01 moritzm: installing apache2 security updates
  • 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish (T275268)
  • 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Enable json image metadata everywhere (T275268) (duration: 01m 05s)
  • 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: Add --sleep option to refreshImageMetadata.php (duration: 01m 04s)
  • 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force (T275268)
  • 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Set testcommonswiki to use json image metadata (T275268) (duration: 01m 10s)

2021-07-09

  • 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 22:36 legoktm: running benchmarking scripts again shellbox
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232 (duration: 03m 08s)
  • 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
  • 11:40 _joe_: deleting coredns pod in codfw, potentially causing T286360
  • 10:13 _joe_: recreated all pods for zotero in codfw
  • 00:47 legoktm: zotero rolling restart didn't help, filed T286360 for DNS issues
  • 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues

2021-07-08

  • 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - T281423 (duration: 00m 57s)
  • 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - T281423 (duration: 00m 58s)
  • 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
  • 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
  • 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
  • 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
  • 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
  • 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
  • 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232 (duration: 03m 06s)
  • 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232
  • 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232 (duration: 05m 27s)
  • 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232
  • 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232 (duration: 05m 42s)
  • 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232
  • 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - T271232 T273901 (duration: 01m 09s)
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
  • 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232 (duration: 03m 22s)
  • 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
  • 12:52 moritzm: installing klibc security updates on buster
  • 12:38 moritzm: installing openexr security updates
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
  • 10:20 jbond: upgrade golang-cfssl
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
  • 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
  • 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
  • 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 T284811
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json

2021-07-07

  • 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
  • 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to {Production,Labs}Services.php (2/2) (duration: 00m 59s)
  • 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to {Production,Labs}Services.php (1/2) (duration: 00m 59s)
  • 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 05m 28s)
  • 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
  • 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232
  • 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 17m 22s)
  • 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232
  • 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
  • 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
  • 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
  • 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
  • 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:49 moritzm: installing djvulibre security updates
  • 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
  • 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232 (duration: 05m 41s)
  • 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232 (duration: 03m 11s)
  • 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232
  • 12:12 urbanecm: Start server-side upload for 3 video files (T286173, T286175, T286174)
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
  • 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
  • 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-06

  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
  • 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
  • 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
  • 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
  • 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
  • 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
  • 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
  • 10:19 moritzm: installing jackson-databind security updates on buster
  • 09:01 _joe_: repooling wdqs1007 now that lag has caught up
  • 08:43 moritzm: installing libuv1 security updates on buster
  • 07:06 marostegui: Upgrade db1104 kernel
  • 06:54 moritzm: installing PHP 7.3 securiy updates on buster
  • 06:50 marostegui: Upgrade db1122 kernel
  • 06:35 marostegui: Upgrade db1138 kernel
  • 06:31 marostegui: Upgrade db1160 kernel
  • 00:56 eileen: process-control config revision is 8d46b52ed4

2021-07-05

  • 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image (T286212)
  • 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
  • 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
  • 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 T286042
  • 11:29 moritzm: installing openexr security updates on stretch
  • 11:07 moritzm: installing tiff security updates on stretch
  • 10:48 moritzm: upgrading PHP on miscweb*
  • 10:37 jbond: enable puppet fleet wide to post puppetdb change
  • 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication T286102
  • 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
  • 08:15 moritzm: rolling out debmonitor-client 0.3.0
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:04 _joe_: restarting blazegraph, then restarting the updater again
  • 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
  • 06:47 _joe_: restart wdqs-updater on wdqs1007
  • 00:53 eileen: process-control config revision is a1717c7fde
  • 00:47 eileen: process-control config revision is 24565578f7

2021-07-04

2021-07-03

  • 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - T286113
  • 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for T283659
  • 08:53 Amir1: patching postorius and mailmanclient on lists1001 for T283659

2021-07-02

  • 22:06 foks: removing three files for legal compliance
  • 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
  • 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
  • 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
  • 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
  • 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
  • 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
  • 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
  • 13:11 mutante: mw2380 - rebooting
  • 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 12:24 moritzm: added btullis to pwstore
  • 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run T285603
  • 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
  • 11:28 mutante: powercycling mw2380, trying to make it boot
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix 3fd2873 for T285579 (duration: 00m 59s)
  • 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
  • 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Fix handling of geEnabled flag (T285996) (duration: 00m 57s)
  • 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
  • 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - T285835
  • 09:19 dcausse: restart blazegraph on wdqs1013
  • 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
  • 09:04 mutante: decom'ing mw1267
  • 09:02 moritzm: installing node-hosted-git-info security updates
  • 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
  • 08:54 moritzm: installing golang-docker-credential-helpers security updates
  • 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:03 moritzm: installing ipmitool security updates
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
  • 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
  • 03:14 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
  • 03:11 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
  • 03:07 ryankemper: T264053 `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
  • 03:07 ryankemper: T264053 `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
  • 03:06 ryankemper: T264053 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
  • 03:05 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - T264053"'`
  • 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
  • 01:47 eileen: civicrm revision changed from e07c2be1a7 to bb62188ec6, config revision is 1739c53fcb
  • 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o (T264053)

2021-07-01

  • 23:29 thcipriani@deploy1002: Synchronized README: Config: Revert "deployment training: readme whitespace" (duration: 00m 56s)
  • 23:21 thcipriani@deploy1002: Synchronized README: Config: deployment training: readme whitespace (duration: 00m 57s)
  • 22:37 urbanecm: Start server-side upload for 1 video file (T285182)
  • 22:36 urbanecm: Start server-side upload for 1 video file (T285789)
  • 22:31 dancy@deploy1002: Synchronized .pipeline: Config: Use train-versions.json to map from version to image tag (T282824) (duration: 00m 57s)
  • 22:27 urbanecm: Start server-side upload for 1 video file (T285682)
  • 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: Temporarily disable notification for security patch failures (duration: 00m 57s)
  • 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
  • 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
  • 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: Trigger update-train-versions job at end of wmf-publish pipeline (duration: 01m 08s)
  • 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
  • 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
  • 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7995f7a: Use Vue.js for QuickSurveys on available wikis (T285890) (duration: 01m 09s)
  • 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 654877f: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 12s)
  • 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 6d90430: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 14s)
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
  • 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: T285959 (duration: 01m 20s)
  • 16:11 vgutierrez: restart varnish-fe on cp3059 - T285953
  • 14:58 papaul: poweroff mw2380 for disk replacement
  • 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 14:53 effie: depool mw2380 for disk repair - T285603
  • 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:45 moritzm: installing glib2.0 security updates on buster
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
  • 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
  • 13:03 marostegui: Deploy schema change on s2 eqiad master T276150
  • 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
  • 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
  • 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
  • 12:23 tgr: EU deploys done
  • 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 08s)
  • 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 09s)
  • 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
  • 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
  • 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472) (duration: 01m 15s)
  • 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
  • 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:35 marostegui: Deploy schema change on s8 eqiad master T276150
  • 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
  • 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Avoid using MWNamespace (duration: 01m 06s)
  • 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:05 moritzm: installing remaining libgcrypt20 security updates
  • 09:56 moritzm: installing remaining gnutls28 security updates
  • 09:55 Amir1: start of clean up of autoreview logs in ruwiki (T285608)
  • 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master T277123
  • 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master T277123
  • 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
  • 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
  • 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
  • 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
  • 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
  • 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
  • 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master T277123
  • 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters T277123
  • 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters T277123
  • 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) T277123
  • 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) T277123
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
  • 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8

2021-06-30

  • 23:28 urbanecm: Evening B&C window finished
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 667d880: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
  • 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
  • 21:43 Amir1: deleting auto-review logs from test2wiki (T285608)
  • 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 21:29 cstone: civicrm revision changed from 789c92d13b to e07c2be1a7
  • 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
  • 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
  • 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
  • 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
  • 17:57 thcipriani: restart ci jenkins following upgrade
  • 17:54 thcipriani: restart releases-jenkins following upgrade
  • 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci T285532
  • 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per phab:T285866' # T285866
  • 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
  • 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
  • 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource (T284389) (duration: 01m 20s)
  • 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource (T284389) (duration: 01m 16s)
  • 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource (T284389) (duration: 01m 17s)
  • 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource (T284389)
  • 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource (T284389) (duration: 01m 17s)
  • 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource (T284389) (duration: 01m 14s)
  • 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource (T284389) (duration: 01m 13s)
  • 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki (T284885) (duration: 01m 13s)
  • 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki (T284885) (duration: 01m 15s)
  • 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki (T284885)
  • 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki (T284450) (duration: 01m 12s)
  • 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki (T284450) (duration: 01m 14s)
  • 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki (T284450)
  • 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # T284450
  • 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki (T284450) (duration: 01m 13s)
  • 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
  • 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
  • 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 13:26 moritzm: installing fluidsynth security updates on stretch
  • 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 13:04 mutante: switching docker-registry to nginx light variant T164456
  • 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
  • 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
  • 12:17 kart_: Updated cxserver to 2021-06-30-112813-production (T284900, T284885)
  • 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:46 Lucas_WMDE: EU backport+config window done
  • 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
  • 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (1/3, prod) (duration: 01m 16s)
  • 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
  • 11:11 moritzm: installing libgcrypt security updates on buster
  • 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
  • 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client repoConceptBaseUri (T257260) (duration: 01m 24s)
  • 10:44 moritzm: installing gnutls security updates on buster
  • 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
  • 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - T162123
  • 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
  • 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
  • 08:31 godog: remove sdf1 from thanos-be1003 in swift - T285835
  • 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
  • 07:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 05:46 ryankemper: [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw`
  • 00:36 tstarling@deploy1002: Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:35 tstarling@deploy1002: Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:34 tstarling@deploy1002: Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:32 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:31 tstarling@deploy1002: Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:27 tstarling@deploy1002: Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s)
  • 00:01 urbanecm: (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of T285819

2021-06-29

  • 23:45 urbanecm: Evening B&C window done
  • 23:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 367bc98: 904d18720: flood flag changes for enwikibooks (T285594) (duration: 01m 07s)
  • 23:45 urbanecm: Remove TrainBranchBot from wmf-deployment Gerrit group, merges code to mediawiki-config without actually deploying it
  • 23:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 8a5b835: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 04s)
  • 23:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: c61fb17: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 05s)
  • 23:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 05s)
  • 23:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 06s)
  • 23:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: e77e002: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 09s)
  • 21:58 maryum: deployed security patch T285515 to wmf.12
  • 21:51 maryum: deployed security patch T285515 to wmf.11
  • 21:44 maryum: deployed updated security patch for T285190 to wmf.12
  • 21:42 maryum: deployed updated security patch for T285190 to wmf.11
  • 21:31 sbassett: Reverted and deployed updated security patch for T285190 to wmf.12
  • 21:29 sbassett: Reverted and deployed updated security patch for T285190 to wmf.11
  • 21:19 sbassett: Deployed updated security patch for T285190 to wmf.11
  • 20:55 dancy: Deleted all CDB files on beta so they'll be recreated on the next scap sync-world run
  • 20:26 dancy: Reverting to scap 3.17.1-1+0~20210419163335.8~1.gbpa6b2e0 in beta
  • 19:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:34 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc3
  • 18:28 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
  • 18:21 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc1
  • 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:07 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.7 (duration: 04m 00s)
  • 17:59 urbanecm: Start server-side upload of ~2.5G of JPG files (T282755)
  • 17:52 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.12 (duration: 57m 11s)
  • 16:55 ryankemper: T281327 `[Cirrus -> codfw]` Current banned nodes are`elastic2043` and `elastic2045`; `elastic2043` can be unbanned after a re-image, and `elastic2045` can be unbanned in ~30 minutes after shards rebalance (had heavy shards scheduled)
  • 16:55 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.12
  • 16:45 brennen: 1.37.0-wmf.12 was branched at 3703c31 for T281153
  • 16:28 ebernhardson: temporarily ban elastic2045 from production-search-codfw
  • 15:43 dcausse: unbanning elastic2054
  • 15:30 dcausse: restarting blazegraph on wdqs1012
  • 15:17 effie: pool mw2383 back
  • 15:15 mutante: [mwlog2002:~] $ sudo systemctl start mw-log-cleanup
  • 15:06 dcausse: banning elastic2054
  • 14:53 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[1-2].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[8-9].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw225[1-2].codfw.wmnet,service=canary
  • 14:52 effie: depool mw2383 as it is misbehaving
  • 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:47 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw226[1-2].codfw.wmnet
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2290.codfw.wmnet
  • 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:46 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw22[7-8][0-9].codfw.wmnet
  • 14:45 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet
  • 14:44 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:44 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet,service=api_appserver
  • 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 14:38 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 14:38 _joe_: restarting pohp-fpm on mw2383
  • 14:38 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:37 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2103 (s1) weight a bit', diff saved to https://phabricator.wikimedia.org/P16739 and previous config saved to /var/cache/conftool/dbconfig/20210629-143742-marostegui.json
  • 14:37 _joe_: repooling mw2383
  • 14:36 _joe_: depooling mw2383
  • 14:30 legoktm@deploy1002: Synchronized wmf-config/db-codfw.php: fix trwikivoyage (duration: 01m 01s)
  • 14:29 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:28 Krinkle: TODO: Don't duplicate `sectionsByDB` between db-* files
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: MediaWiki read-only period ends at: 2021-06-29 14:23:23.504447
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:21 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: MediaWiki read-only period starts at: 2021-06-29 14:21:26.671853
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:15 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:15 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:11 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:10 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:09 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:08 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:02 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:01 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:01 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:51 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2] (duration: 00m 07s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 17m 42s)
  • 13:35 volker-e@deploy1002: Finished deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476) (duration: 00m 07s)
  • 13:34 volker-e@deploy1002: Started deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
  • 11:54 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: vector: Finish enabling language switcher treatment A/B test on fawiki (T269093) (duration: 00m 56s)
  • 11:38 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part II (duration: 00m 58s)
  • 11:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/includes/Rdf/PropertyStubRdfBuilder.php: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part I (duration: 00m 56s)
  • 11:35 ladsgroup@deploy1002: sync-file aborted: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634) (duration: 00m 10s)
  • 10:30 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on acmechief* after switch towards nginx-light T164456
  • 09:27 moritzm: installing nettle security updates on buster
  • 08:47 elukey: repool mw13[55,84] after debugging - T285634
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
  • 08:43 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 08:25 elukey: cumin 'A:mw-eqiad' '/usr/local/sbin/restart-php7.2-fpm' -b 2 -s 30 - T285634
  • 08:21 elukey: depool mw1355 (mw appserver) for debugging - T285634
  • 08:21 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
  • 08:12 hashar: Upgrading Jenkins on contint2001 / contint1001 and restarting CI Jenkins # T285531
  • 08:03 hashar: Upgraded Jenkins on releases1002 / releases2002 # T285531
  • 08:02 hashar: Upgraded Jenkins on releases1002 / releases2002
  • 07:50 godog: remove 20G migration data /root/prometheus from prometheus4001 - T243057
  • 07:48 godog: remove old /root/prometheus data from prometheus4001
  • 07:05 moritzm: upgrading bullseye early installs to the latest state of testing T275873
  • 06:46 tstarling@deploy1002: Synchronized php-1.37.0-wmf.11/includes/MediaWiki.php: Add statsd action timing metric T284274 (duration: 00m 58s)
  • 02:47 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload and A:codfw' 'run-puppet-agent -q'
  • 02:34 ryankemper: T285643 Banned `elastic1039` from all 3 elasticsearch clusters and set `elastic1039.eqiad.wmnet` to failed in netbox
  • 02:27 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload' 'run-puppet-agent -q'
  • 02:25 eileen: civicrm revision changed from 927ab7cff7 to 789c92d13b, config revision is 1739c53fcb
  • 02:04 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0e916b1]: 0.3.75 (duration: 08m 40s)
  • 01:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.75` on canary `wdqs1003`; proceeding to rest of fleet
  • 01:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0e916b1]: 0.3.75
  • 01:50 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.75`. Pre-deploy tests passing on canary `wdqs1003`
  • 00:25 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref T282761

2021-06-28

  • 23:07 urbanecm: Evening B&C window done
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ec855d: Enable Parsoid inspired media structure on test wikis (T51097) (duration: 00m 59s)
  • 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 22:51 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 22:50 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:48 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:48 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 22:44 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 22:43 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-06-28 22:43:04.512602
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:41 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-28 22:41:41.222740
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 22:40 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:38 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:38 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:32 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 22:31 legoktm: starting DC switchover live test, which will "switch" us from codfw -> eqiad
  • 22:28 eileen: civicrm revision changed from 9d1203fb28 to 927ab7cff7, config revision is 1739c53fcb
  • 22:09 legoktm: live-hacked spicerack on cumin1001 to ignore x2, see https://phabricator.wikimedia.org/T285519#7182377
  • 21:55 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc2, ref T282761
  • 20:03 cstone: payments-wiki revision is d9892207c1
  • 19:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/maintenance/: I618bc1 (duration: 00m 56s)
  • 19:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/libs/objectcache/: T282761 - I618bc1 (duration: 00m 56s)
  • 19:45 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/objectcache/SqlBagOStuff.php: T282761 - I618bc1 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1002: Synchronized wmf-config/: T281515: Prepare Cirrus more_like for dc switchover (duration: 01m 02s)
  • 18:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 3/3) (duration: 00m 55s)
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HelpPanelHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 2/3) (duration: 00m 55s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 1/3) (duration: 00m 58s)
  • 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor/: 794a46c: Hotfix for broken "Extract show all to placeholder class" (T284636; T285571) (duration: 00m 57s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4ae0fdd: Enable DiscussionTools topicsubscription as beta feature on partner wikis (T274280) (duration: 00m 57s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b59184: Remove redundant wgDiscussionToolsEnable overrides (duration: 00m 56s)
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1043c93: Growth: Enable community configuration at all Growth wikis (T285423) (duration: 00m 56s)
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 sukhe: Traffic: depool eqiad from user traffic
  • 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-.*,name=eqiad
  • 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:08 jayme@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:07 gehel: restarting wdqs-updater on all wdqs hosts for new configuration
  • 14:54 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:53 jayme@cumin1001: Switching services swift, proton, mathoid, restbase, swift-ro, eventstreams, search, shellbox, eventgate-analytics-external, wdqs-internal, kartotherian, api-gateway, termbox, mobileapps, similar-users, wikifeeds, apertium, restbase-async, eventgate-main, eventgate-logging-external, ores, sessionstore, linkrecommendation, echostore, push-notifications, citoid, zotero, eventgate-analytics, wdqs, eventstreams-i
  • 14:53 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:37 jayme@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=99)
  • 14:36 jayme@cumin1001: Switching services kartotherian, proton, wdqs-internal, wikifeeds, zotero, recommendation-api, swift-ro, linkrecommendation, mobileapps, citoid, eventgate-analytics, push-notifications, eventstreams-internal, mathoid, similar-users, schema, apertium, restbase-async, shellbox, termbox, wdqs, ores, eventgate-analytics-external, swift, helm-charts, restbase, cxserver, search, sessionstore, eventstreams, api-gate
  • 14:36 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:35 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:21 effie: restarted mw[1322,1329,1333,1350,1351,1352,1353,1354,1366,1367,1368,1370,1372,1373]
  • 14:07 effie: restarting busy php-fpm app servers
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (2/2, beta) (duration: 00m 57s)
  • 13:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (1/2, prod) (duration: 00m 57s)
  • 12:59 moritzm: installing intel-microcode security updates on buster
  • 12:30 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/MediaHandler.php: Backport: media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully (T285490) (duration: 00m 56s)
  • 12:24 dcausse: repool wdqs1012
  • 12:00 Lucas_WMDE: EU backport+config window done
  • 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase repo foreignRepositories (T257260) (duration: 00m 55s)
  • 11:40 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-eqiad
  • 11:38 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-codfw
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4a088f: vector: Enable language switcher treatment A/B test on fawiki (T269093) (duration: 00m 55s)
  • 11:28 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/signup/campaign.less: cd16aa2: Donor campaign: fix signup page styling (T284740) (duration: 00m 56s)
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9495d18: GrowthExperiments: Update campaign pattern (T284800) (duration: 00m 56s)
  • 11:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ scap pull # did not print any errors
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ade641b: Deploy ContentTranslation out of Beta feature in 9 WPs (T284641) (duration: 00m 56s)
  • 10:44 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:43 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:25 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:23 mutante: sodium - restarted nginx
  • 10:23 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:22 mutante: sodium (mirrors.wikimedia.org) - switching to nginx light variant T164456
  • 10:11 vgutierrez: rolling upgrade of ATS on eqiad - T285535
  • 10:11 moritzm: installing remaining libxml2 security updates
  • 09:52 vgutierrez: rolling upgrade of ATS on esams - T285535
  • 09:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (2/2, beta) (duration: 00m 56s)
  • 09:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (1/2, prod) (duration: 00m 57s)
  • 09:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 09:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 09:39 Lucas_WMDE: ^ wrong gerrit change used for message, sorry
  • 09:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Config: Stop setting Wikibase repo foreignRepositories (T257260) (1/2, prod) (duration: 00m 10s)
  • 09:27 vgutierrez: rolling upgrade of ATS on eqsin - T285535
  • 09:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client changesDatabase (T257260) (duration: 00m 55s)
  • 08:56 vgutierrez: rolling upgrade of ATS on codfw - T285535
  • 08:53 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part II (duration: 00m 57s)
  • 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part I (duration: 00m 56s)
  • 08:48 mutante: phab1001 - removing 2fa for my own account
  • 08:40 vgutierrez: rolling upgrade of ATS on ulsfo - T285535
  • 08:40 jayme: drain kubestage2002 for docker restart(s)
  • 08:33 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part II (duration: 00m 55s)
  • 08:31 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part I (duration: 00m 58s)
  • 08:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove special configurations for Dagbani in Wikibase code (T283168) (duration: 00m 56s)
  • 08:25 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:23 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:21 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set Wikidata's main sandbox item (T219215), Part II (duration: 00m 56s)
  • 08:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata's main sandbox item (T219215), Part I (duration: 00m 57s)
  • 08:19 jynus: stop and remove db1145:s5 db2099:s5 T283235
  • 07:58 dcausse: depool and restart blazegraph on wdqs1012
  • 07:57 jelto: jelto@cumin1001:~$ sudo cumin install* 'run-puppet-agent' # update DHCP entry for gitlab2001 on install[1003,2003,3001,4001,5001].wikimedia.org
  • 07:57 dcausse: repool wdqs1005
  • 07:46 hashar@deploy1002: Finished deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana (duration: 00m 08s)
  • 07:46 hashar@deploy1002: Started deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana
  • 06:16 XioNoX: remove BGP to AS13768 in AMS-IX

2021-06-27

  • 09:10 elukey: cumin 'A:mw-eqiad and not P{mw13[67,54,55,72,33,50,51,73,52,49,53,65,71,84,68,70,66,91,89,97,95,99,85,93,87]*} and not P{mw14[09,03,11,07,05,01]*} and not P{mw12[61-69]*} and not P{mwdebug*}' '/usr/local/sbin/restart-php7.2-fpm' -b 1 -s 30
  • 09:10 elukey: roll restart the remaining mw appservers to clear out apcu framentation (cumin command to follow)
  • 08:58 elukey: slow roll restart (cumin -b 1 -s 30) of mw126[1-7]'s php-fpm (75-80% of apcu fragmentation)
  • 08:37 elukey: restart php-fpm on mw1268 mw1269 - low idle workers
  • 08:23 elukey: restart php-fpm on mw1401

2021-06-26

  • 21:28 volans: upgraded spicerack to v0.0.56 on the cumin hosts (includes only bug fixes for the switchdc)
  • 21:23 volans: uploaded spicerack_0.0.56 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:37 elukey: restart php-fpm on mw1387
  • 15:43 elukey: restart php-fpm on mw1393
  • 15:39 elukey: restart php-fpm on mw1405 mw1399 mw1385
  • 15:37 elukey: restart php-fpm on mw1397 mw1395 mw1411 mw1407
  • 15:31 elukey: restart php-fpm on mw1391 mw1389 mw1403
  • 13:49 elukey: restart php-fpm on mw1368 mw1370 mw1366 mw1409
  • 13:43 elukey: depool mw1384 for investigation
  • 13:43 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
  • 13:33 elukey: restart phpfpm on mw1353 mw1365 mw1371
  • 13:30 elukey: restart php-fpm on mw1351 mw1373 mw1352 mw1349
  • 13:23 elukey: restart-phpfpm on mw1350 (0 idle php workers)
  • 13:20 elukey: restart-phpfpm on mw1333 (0 idle php workers)
  • 10:08 elukey: restart php-fpm on mw1372 - T285593
  • 10:07 elukey: restart php-fpm on mw1372 - T285593
  • 09:45 elukey: restart php-fpm on mw135[4-5]
  • 09:44 elukey: restart php-fpm on mw1354
  • 09:38 elukey: reboot mw1414 (not reachable via ssh, nor via mgmt console)
  • 09:33 elukey: restart php-fpm on mw1367 (php fatal memory errors, php7adm /apcu-frag returns errors)

2021-06-25

  • 21:37 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/: cirrus: Revert "Stop querying ores_articletopic" (3/3) (duration: 01m 01s)
  • 21:35 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Wikimedia/WeightedTagsHooks.php: cirrus: Revert "Stop querying ores_articletopic" (2/3) (duration: 00m 58s)
  • 21:34 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Parser/FullTextKeywordRegistry.php: cirrus: Revert "Stop querying ores_articletopic" (1/3) (duration: 00m 58s)
  • 20:32 legoktm: legoktm@mwmaint1002:~$ sudo systemctl reset-failed # to clear icinga alert
  • 20:28 legoktm: legoktm@mwmaint1002:~$ sudo systemctl start mediawiki_job_update_special_pages.service (T285583)
  • 20:21 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.SuggestedEdits.js: eaec745: SuggestedEdits: Only log task impression for EditCardWidget (T283546; emergency deployment) (duration: 01m 00s)
  • 18:08 legoktm: legoktm@ms-fe2005:~$ sudo systemctl unmask swiftrepl-mw.service
  • 15:46 mutante: mw1326, mw1327, mw1328, mw1329 ... restarted php-fpm
  • 15:41 mutante: mw1330, mw1320, mw1321, mw1322 - restarted php-fpm
  • 15:38 mutante: [mw1330:~] $ sudo restart-php7.2-fpm
  • 15:36 mutante: [mw1332:~] $ sudo restart-php7.2-fpm
  • 15:28 mutante: [mw1319:~] $ sudo restart-php7.2-fpm
  • 15:20 rzl: rzl@mw1320:~$ sudo restart-php7.2-fpm # workers stuck since the ~14:00 request spike
  • 15:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab2001.wikimedia.org
  • 14:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 13:50 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab2001.wikimedia.org
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 13:08 vgutierrez: update ATS to version 8.0.8-1wm4 on cp4026 and cp4032 - T285535
  • 13:06 vgutierrez: upload trafficserver 8.0.8-1wm4 to apt.wm.o (buster) - T285535
  • 12:28 moritzm: installing nmap bugfix update from Buster point release
  • 12:28 moritzm: installing nmal bugfix update from Buster point release
  • 11:28 moritzm: installing 4.19.194 kernels on Buster from latest 10.10 point release (no reboots, just rolling out the packages)
  • 09:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:15 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:13 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:13 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 08:55 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 08:54 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 08:48 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 08:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 08:01 elukey: reboot an-worker1101 to unblock stuck GPU
  • 08:00 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 08:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 07:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 07:42 moritzm: imported Jenkins 2.289.1 to thirdparty/ci for buster-wikimedia T285531
  • 07:30 dcausse: depool and restart blazegraph on wdqs1005
  • 07:17 dcausse: installing openjdk-8-dbg on wdqs1005 to debug blazegraph

2021-06-24

  • 23:02 legoktm: reverted cumin1001 spicerack live hacks
  • 22:57 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:36 volans: set x2 codfw master back to RW
  • 22:30 legoktm@cumin1001: END (ERROR) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=97)
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:29 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:29 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-24 22:29:25.643909
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 22:09 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:05 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:04 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:04 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:01 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 21:59 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:47 legoktm: live hacked spicerack on cumin1001 to revert https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/700963/
  • 20:58 legoktm: starting dry run and live test of DC switchover
  • 20:53 legoktm: legoktm@phab1001:~$ sudo /srv/phab/phabricator/bin/remove destroy M320 (spam)
  • 20:44 volans: uploaded spicerack_0.0.55 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 20:28 legoktm: re-enabled daily digests for wikimedia-l - T285486
  • 19:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.11
  • 19:07 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:04 dduvall: preparing to roll group2 to 1.37.0-wmf.11 (T281152) (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 17:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 17:08 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 16:12 twentyafterfour: restarted php7.3-fpm on phab1001
  • 15:43 hnowlan: running `nodetool decommission` on maps2007
  • 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 15:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:31 moritzm: installing jackson-databind security updates
  • 15:26 moritzm: installing ruby-websocket-extensions security updates
  • 15:02 hnowlan: reenabling puppet on P{C:Postgresql::Slave}
  • 14:59 moritzm: restarting mw canaries to pick up libxml2 security update
  • 14:57 moritzm: installing libxml2 security updates on buster
  • 14:46 hnowlan: Disabling puppet on P{C:Postgresql::Slave} (netboxdb2001,puppetdb2002, most maps hosts) to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/700071
  • 13:29 volans: uploaded python3-wmflib_0.0.8 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 12:45 tgr: EU deploys done
  • 12:44 tgr@deploy1002: Finished scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281) (duration: 26m 07s)
  • 12:18 tgr@deploy1002: Started scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281)
  • 12:08 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 2 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:53 jayme: import dragonfly_1.0.6-1 into buster-wikimedia
  • 11:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:44 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:37 jayme: depooling registry2008 for some dragonfly testing
  • 11:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry2008.codfw.wmnet,dc=codfw,cluster=docker-registry
  • 11:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 1 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:25 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update $wgNamespacesToBeSearchedDefault for wikimania (T284793) (duration: 01m 07s)
  • 11:21 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable OCR tool on all Wikisources (T285311) (duration: 01m 06s)
  • 11:11 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable link recommendation feature for more wikis (T284481) (duration: 01m 07s)
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16723 and previous config saved to /var/cache/conftool/dbconfig/20210624-092226-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16722 and previous config saved to /var/cache/conftool/dbconfig/20210624-092157-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16721 and previous config saved to /var/cache/conftool/dbconfig/20210624-092105-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16720 and previous config saved to /var/cache/conftool/dbconfig/20210624-092029-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16719 and previous config saved to /var/cache/conftool/dbconfig/20210624-091949-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s2 weights T284897', diff saved to https://phabricator.wikimedia.org/P16718 and previous config saved to /var/cache/conftool/dbconfig/20210624-091753-marostegui.json
  • 09:02 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes: Backport: media: Make the file metadata "_error" check looser (T285431) (duration: 01m 12s)
  • 08:55 legoktm: root@lists1001:/var/log/mailman# rm -rf *
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights T284897', diff saved to https://phabricator.wikimedia.org/P16717 and previous config saved to /var/cache/conftool/dbconfig/20210624-084147-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16716 and previous config saved to /var/cache/conftool/dbconfig/20210624-081409-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16715 and previous config saved to /var/cache/conftool/dbconfig/20210624-081251-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16714 and previous config saved to /var/cache/conftool/dbconfig/20210624-081137-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from s5 api T284897', diff saved to https://phabricator.wikimedia.org/P16713 and previous config saved to /var/cache/conftool/dbconfig/20210624-080945-marostegui.json
  • 08:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 08:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T284897', diff saved to https://phabricator.wikimedia.org/P16712 and previous config saved to /var/cache/conftool/dbconfig/20210624-075613-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T284897', diff saved to https://phabricator.wikimedia.org/P16711 and previous config saved to /var/cache/conftool/dbconfig/20210624-074200-marostegui.json
  • 07:35 eileen: civicrm revision changed from 6d3dd6e5a5 to 9d1203fb28, config revision is 735af27f0d
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T284897', diff saved to https://phabricator.wikimedia.org/P16710 and previous config saved to /var/cache/conftool/dbconfig/20210624-072657-marostegui.json
  • 03:57 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 735af27f0d
  • 03:26 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 1e8e9ac7b9
  • 00:25 eileen: civicrm revision changed from bd906975f0 to 6d3dd6e5a5, config revision is 821e5889f7
  • 00:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:10 eileen: process-control config revision is 821e5889f7
  • 00:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE

2021-06-23

  • 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:42 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:22 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:19 dduvall: rolling back 1.37.0-wmf.11 from group1 (T281152) due to reoccurrence of "PHP Notice: Undefined index: frameCount" now at PNGHandler.php:156 (T285431)
  • 23:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:14 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 23:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 23:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:10 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes
  • 23:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:05 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/GIFHandler.php: Backport: Check for _error in getting metadata array in GIFHandler (T285431) (duration: 01m 06s)
  • 22:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/PNGHandler.php: Backport: Check for _error in getting metadata array in PNGHandler (T285431) (duration: 01m 06s)
  • 22:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 22:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 21:45 sbassett: Deployed updated security patch for T285190 to wmf.9 and wmf.11
  • 20:55 ejegg: updated payments-wiki from 42cfbe832d to d9892207c1
  • 20:38 eileen: civicrm revision changed from 53d103f672 to bd906975f0, config revision is 6a88618c3e
  • 20:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 19:39 dduvall: rolling back wmf.11 from group1 due to increase in logspam possibly related to noted risky patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 (cc T281152 and patch contact Amir1)
  • 19:35 herron: rebooting kafkamon hosts for updates
  • 19:26 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:20 dduvall: preparing to promote wmf.11 group1 (T281152) cc'ing risky patch contacts Amir1, Krinkle, DannyS712
  • 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6e0f5ad: Enable GrowthExperiments donor landing page for testing (T284799) (duration: 01m 05s)
  • 19:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: 2338e53: Revert "Add custom signup flow for donors" (T284740; T284800; T285281) (duration: 01m 06s)
  • 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 38s)
  • 18:55 urbanecm@deploy1002: sync-file aborted: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 01s)
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:54 urbanecm@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 18:53 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 01m 07s)
  • 18:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/WikimediaEvents/extension.json: 01f034b: Finalize WMDEBanner* schema migration to Event Platform (T282562) (duration: 01m 05s)
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 17efbaf: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki (T279886; T285385) (duration: 01m 06s)
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3a2fc6e: Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites (duration: 01m 05s)
  • 18:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try (duration: 09m 11s)
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b4a7867: Make Growth features available to newcomers at lvwiki and skwiki (T278191; T284149) (duration: 01m 06s)
  • 17:58 herron: beginning rolling reboots of kafka-main100[1-5] for updates
  • 17:57 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for NavigationTiming ext streams - T271208, T266798 (duration: 01m 29s)
  • 17:07 herron: beginning rolling reboots of kafka-main200[1-5] for updates
  • 16:42 XioNoX: re-start sending traffic on the codfw-eqsin Telia transport link
  • 15:17 topranks: Removing peering to AS64050 / "BGP Consultancy Pte Ltd" at AMS-IX (cr2-esams). Peer has left IX.
  • 14:54 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s1
  • 14:53 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s8
  • 13:54 effie: rolling restart thanos-fe* to pick up new tegola-vector-tiles account - T283049
  • 13:45 volans: uploaded cumin_4.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:27 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s4
  • 12:59 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s3
  • 12:46 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s7
  • 12:35 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s6
  • 12:26 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s5
  • 12:15 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # T170737
  • 11:46 XioNoX: Simplify labs-in4/6 firewall filters - CR700939
  • 11:10 topranks: Removing peering to AS39651 / "Com Hem AB" at AMS-IX (cr2-esams). Peer has left IX.
  • 10:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:35 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 20s)
  • 09:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 09:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:48 volans: sudo systemctl start ferm.service on thanos-fe2002 (DNS query timeout)
  • 08:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 14s)
  • 08:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 07:57 kart_: cxserver: Removed Matxin MT support and added more language support to Elia MT (T285199, T284900)
  • 07:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:49 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:46 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 07:26 legoktm: uploaded mailman3_3.3.3-1~bpo10+6_amd64.changes on apt1001
  • 07:08 legoktm: updating mailman packages on lists1001 and restarting (T285120, T280889)
  • 06:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo pool`
  • 06:37 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo pool`
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16703 and previous config saved to /var/cache/conftool/dbconfig/20210623-062819-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16702 and previous config saved to /var/cache/conftool/dbconfig/20210623-061316-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16701 and previous config saved to /var/cache/conftool/dbconfig/20210623-055812-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1100', diff saved to https://phabricator.wikimedia.org/P16700 and previous config saved to /var/cache/conftool/dbconfig/20210623-054252-marostegui.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16699 and previous config saved to /var/cache/conftool/dbconfig/20210623-045217-root.json
  • 01:04 eileen: process-control config revision is 6a88618c3e
  • 00:50 eileen: civicrm revision changed from c745d4f075 to 03bead707d, config revision is 4ab72c1033
  • 00:40 legoktm: uploaded new versions of flufl.bounce_4.0-1_amd64.changes hyperkitty_1.3.4-2~bpo10+4_amd64.changes mailman3_3.3.3-1~bpo10+5_amd64.changes mailman-hyperkitty_1.1.0-10~bpo10+1_amd64.changes to apt1001
  • 00:02 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)

2021-06-22

  • 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for search event streams (duration: 01m 05s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7865f27: Add unwatchedpages to rollbacker on frwiki (T285334) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 3/3) (duration: 01m 07s)
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/config/nlwiki.yaml: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 2/3) (duration: 01m 05s)
  • 23:04 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 1/3) (duration: 01m 37s)
  • 22:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=subcats # T170737
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=pages # T170737
  • 22:38 urbanecm: mwscript recountCategories.php --wiki=eowiktionary --mode={pages,subcats,files} (T170737)
  • 21:05 eileen: civicrm revision changed from 629bd3b7b7 to c745d4f075, config revision is 4ab72c1033
  • 21:05 ejegg: updated payments-wiki from 7be0534b91 to 42cfbe832d
  • 20:46 brennen: gitlab1001: running ansible to deploy CAS: stop marking users as external (T274461)
  • 20:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:12 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 20:12 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 19:58 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/699812 (T264231)
  • 19:26 legoktm: set mediawiki-l message acceptance to discard non-member posts instead of reject
  • 19:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.11
  • 19:06 dduvall: preparing to promote wmf.11 group0 (T281152) cc'ing risking patch contacts Amir1, Krinkle, DannyS712
  • 19:01 dduvall@deploy1002: Pruned MediaWiki: 1.37.0-wmf.6 (duration: 03m 35s)
  • 18:46 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs (duration: 04m 23s)
  • 18:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs
  • 18:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:30 awight@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 01m 09s)
  • 18:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:27 awight@deploy1002: sync-file aborted: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 00m 40s)
  • 18:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:19 legoktm: pulled in updates for thirdparty/kubeadm-k8s-1-18 buster-wikimedia on apt1001
  • 17:47 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/700851 (T274463)
  • 17:43 dduvall: testwikis to 1.37.0-wmf.11 (cc open blockers T285125 T285118 T271011)
  • 17:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.11 (duration: 30m 59s)
  • 17:21 moritzm: installing isc-dhcp security updates
  • 17:18 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:14 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:11 moritzm: installing ruby-websocket-extensions security updates
  • 17:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.11
  • 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:07 moritzm: installing velocity security updates
  • 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:04 dduvall: 1.37.0-wmf.11 was branched at c161d3b for T281152
  • 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:41 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 14:57 dcausse@deploy1002: Finished deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74 (duration: 13m 26s)
  • 14:43 dcausse@deploy1002: Started deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74
  • 14:37 XioNoX: start updating analytics firewall rules to capirca generated ones on cr2-eqiad - T279429
  • 14:35 hoo: Updated the Wikidata property suggester with data from the 2021-05-31 JSON dump (with pre-applied T132839 workarounds)
  • 14:01 XioNoX: start updating analytics firewall rules to capirca generated ones on cr1-eqiad - T279429
  • 13:49 kormat: disabling puppet on A:db-all for T285079
  • 13:38 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=nlwiki --phab=T285254 # T285254
  • 13:37 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=nlwiki growthexperiments # T285254
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Correctly enable Vector language switcher treatment A/B test (T269093) (duration: 00m 57s)
  • 13:29 urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # T266913
  • 13:29 Trey314159: reindexing German wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 12:04 Lucas_WMDE: backport+config window done
  • 12:03 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new Vector Languages-in-header feature & AB test for pilot wikis (T269093) (duration: 00m 56s)
  • 11:58 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/: Backport: launchULS: Add context to interface.language.change hook (T280770) (duration: 00m 57s)
  • 11:35 moritzm: installing fluidsynth security updates
  • 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: enwiki: Remove 'collectionsaveascommunitypage' from the 'autoconfirmed' user group (T283523) (duration: 00m 56s)
  • 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16691 and previous config saved to /var/cache/conftool/dbconfig/20210622-110619-kormat.json
  • 10:51 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16690 and previous config saved to /var/cache/conftool/dbconfig/20210622-105115-kormat.json
  • 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16689 and previous config saved to /var/cache/conftool/dbconfig/20210622-103612-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16688 and previous config saved to /var/cache/conftool/dbconfig/20210622-102108-kormat.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16687 and previous config saved to /var/cache/conftool/dbconfig/20210622-094019-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16686 and previous config saved to /var/cache/conftool/dbconfig/20210622-092515-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16685 and previous config saved to /var/cache/conftool/dbconfig/20210622-092056-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16684 and previous config saved to /var/cache/conftool/dbconfig/20210622-091012-root.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16683 and previous config saved to /var/cache/conftool/dbconfig/20210622-090552-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16682 and previous config saved to /var/cache/conftool/dbconfig/20210622-085508-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16681 and previous config saved to /var/cache/conftool/dbconfig/20210622-085049-root.json
  • 08:49 marostegui: Upgrade db1166
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16680 and previous config saved to /var/cache/conftool/dbconfig/20210622-084915-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16679 and previous config saved to /var/cache/conftool/dbconfig/20210622-083545-root.json
  • 07:53 joe: uploaded wmf-certificates package to buster-wikimedia/main, T284417
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 T283499', diff saved to https://phabricator.wikimedia.org/P16678 and previous config saved to /var/cache/conftool/dbconfig/20210622-072828-marostegui.json
  • 06:43 dcausse: repool wdqs1005
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 05:06 marostegui: Stop replication on old s5 master ( db1100) - T284529
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool old master running 10.1 T284529', diff saved to https://phabricator.wikimedia.org/P16677 and previous config saved to /var/cache/conftool/dbconfig/20210622-050602-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 master and set section read-write T284529', diff saved to https://phabricator.wikimedia.org/P16676 and previous config saved to /var/cache/conftool/dbconfig/20210622-050123-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T284529', diff saved to https://phabricator.wikimedia.org/P16675 and previous config saved to /var/cache/conftool/dbconfig/20210622-050036-root.json
  • 05:00 marostegui: Starting s5 eqiad failover from db1100 to db1130 - T284529
  • 04:20 marostegui: Start topology changes for s5 switchover T284529
  • 04:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:11 eileen: process-control config revision is 4ab72c1033
  • 01:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE

2021-06-21

  • 23:16 krinkle@deploy1002: Synchronized wmf-config/mc.php: I13646a5557c9 (duration: 00m 55s)
  • 23:12 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I302a71 (duration: 00m 56s)
  • 23:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Idcac4d (duration: 00m 56s)
  • 23:05 krinkle@deploy1002: Synchronized wmf-config/mc.php: I877a3e (duration: 00m 57s)
  • 23:04 krinkle@deploy1002: Synchronized wmf-config/mc.php: Icc2676 (duration: 00m 56s)
  • 22:57 krinkle@deploy1002: Synchronized wmf-config/mc.php: Iea94283c53 (duration: 00m 57s)
  • 22:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Iea94283c53 (duration: 00m 57s)
  • 22:42 eileen: civicrm revision changed from 0fca489063 to 629bd3b7b7, config revision is 2aed6ff89b
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=viwiki --fix # T284868 # P16674
  • 22:13 eileen: civicrm revision changed from acbcce94a2 to 0fca489063, config revision is 2aed6ff89b
  • 21:11 sbassett: Deployed security patch for T285190
  • 19:19 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 19:19 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 18:41 ppchelko@deploy1002: Synchronized wmf-config/wikitech.php: Replace uses of AbstractBlock::getTarget() T284141 (duration: 00m 58s)
  • 18:30 urbanecm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: af61f1a: Add pool counter for automated search requests (T284479) (duration: 00m 59s)
  • 18:30 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries (duration: 07m 11s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f7db2b9: Enable wikilove on hewikisource (T284864) (duration: 00m 56s)
  • 18:26 brennen: gitlab1001: running ansible for copying latest backup to dedicated folder (T274463)
  • 18:24 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikisource wikilove # T284864
  • 18:23 urbanecm: Correction: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource wikilove # T284864
  • 18:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource # T284864
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries
  • 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dd0fecb: Rename Portal and Portal talk namespaces on viwiki (T284868) (duration: 00m 56s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d8b9df: Disable Education Program namespaces in enwiki (T285193) (duration: 00m 58s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 5a51dd2: Add `managechangetags` to the `abusefilter` group on eswiki (T285167) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 2/2) (duration: 00m 56s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 1/2) (duration: 01m 07s)
  • 17:40 ebernhardson: post-deploy restart airflow-webserver and airflow-scheduler on an-airflow1001
  • 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs (duration: 04m 24s)
  • 17:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 papaul: poweroff elastic2043 for maintenance
  • 15:25 hashar: Updated operations-puppet-tests-buster-docker Jenkins job to use latest Docker image https://gerrit.wikimedia.org/r/c/integration/config/+/700648
  • 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1009.eqiad.wmnet
  • 15:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 15:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
  • 14:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 14:57 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 14:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 14:40 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 14:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 14:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 14:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 14:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:21 volans: deployed spicerack release v0.0.54 on the cumin hosts
  • 14:19 XioNoX: reboot scs-c1-codfw - T285229
  • 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 14:17 XioNoX: reboot scs-a1-codfw - T285229
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1008.eqiad.wmnet
  • 14:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 14:14 klausman: starting update of ML team's etcd machines in eqiad
  • 14:14 volans: uploaded spicerack_0.0.54 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 14:11 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1008.eqiad.wmnet
  • 14:06 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 14:05 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 14:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 13:58 XioNoX: reboot scs-eqsin - T285229
  • 13:58 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 13:57 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1006.eqiad.wmnet
  • 13:56 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 13:55 jynus: stopping replication at db1171:s3 at db1123-bin.004363:906878073
  • 13:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 13:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1006.eqiad.wmnet
  • 13:48 XioNoX: reboot scs-ulsfo
  • 13:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 13:40 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 13:38 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 13:35 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MobileFrontend/includes/ExtMobileFrontend.php: Backport: Avoid loading the whole entity when it only needs description. (T269960) (duration: 00m 58s)
  • 13:28 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 13:24 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 13:19 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 13:17 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 13:14 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:12 elukey: upload istioctl 1.9.5 to {buster,stretch}-wikimedia
  • 13:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:09 klausman: starting update of ML team's etcd machines in codfw
  • 12:55 godog: move librenms alerts with "max alerts" == -1 to "interval" being 15m - T285205
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16672 and previous config saved to /var/cache/conftool/dbconfig/20210621-124030-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16671 and previous config saved to /var/cache/conftool/dbconfig/20210621-123906-root.json
  • 12:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase: Backport: Rewrite SerializationModifier to be more efficient (duration: 01m 02s)
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1010.eqiad.wmnet
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16670 and previous config saved to /var/cache/conftool/dbconfig/20210621-122526-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16669 and previous config saved to /var/cache/conftool/dbconfig/20210621-122403-root.json
  • 12:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1010.eqiad.wmnet
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2008.codfw.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16668 and previous config saved to /var/cache/conftool/dbconfig/20210621-121023-root.json
  • 12:10 godog: bump space for k8s and ops prometheus on prometheus1004 (prometheus1003 has been expanded previously but not logged)
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16667 and previous config saved to /var/cache/conftool/dbconfig/20210621-120859-root.json
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2008.codfw.wmnet
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16665 and previous config saved to /var/cache/conftool/dbconfig/20210621-115519-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T283499', diff saved to https://phabricator.wikimedia.org/P16664 and previous config saved to /var/cache/conftool/dbconfig/20210621-115441-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16663 and previous config saved to /var/cache/conftool/dbconfig/20210621-115355-root.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T283499', diff saved to https://phabricator.wikimedia.org/P16662 and previous config saved to /var/cache/conftool/dbconfig/20210621-115143-marostegui.json
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0bf35e0: Disable indexing user (sub)pages and draft-related pages on hrwiki (T284384) (duration: 00m 56s)
  • 11:21 urbanecm@deploy1002: Synchronized logos/config.yaml: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 57s)
  • 11:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 464cc0b: ptwikinews: Remove NS ID 102,103 (T285163) (duration: 00m 56s)
  • 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add WMCS public addresses to $wgSoftBlockRanges (duration: 00m 56s)
  • 11:04 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 53s)
  • 11:01 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:55 moritzm: restarting FPM on mw canaries to pick up nettle security updates
  • 10:45 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:45 moritzm: installing nettle security updates on buster
  • 10:44 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:44 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:43 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:41 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 50s)
  • 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:40 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:37 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 22s)
  • 10:36 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:36 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 56s)
  • 10:29 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next
  • 10:27 jbond@deploy1002: Finished deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next (duration: 03m 12s)
  • 10:24 jbond@deploy1002: Started deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next
  • 10:22 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 22s)
  • 10:20 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:19 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 13s)
  • 10:17 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:16 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 03s)
  • 10:15 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 30s)
  • 10:13 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 (duration: 03m 10s)
  • 10:10 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/FlaggedRevs: Backport: Drop LocalFile::getHistory hook handler (T284777 T277883) (duration: 00m 58s)
  • 09:52 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable wikisource group as langlink group of sourcewiki (T275958) (duration: 00m 56s)
  • 09:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wmgWikibaseTmpSerializeEmptyListsAsObjects to true everywhere (T241422) (duration: 00m 57s)
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16659 and previous config saved to /var/cache/conftool/dbconfig/20210621-094049-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T284529', diff saved to https://phabricator.wikimedia.org/P16658 and previous config saved to /var/cache/conftool/dbconfig/20210621-092623-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16657 and previous config saved to /var/cache/conftool/dbconfig/20210621-092545-root.json
  • 09:19 ladsgroup@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 04m 49s)
  • 09:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16656 and previous config saved to /var/cache/conftool/dbconfig/20210621-091041-root.json
  • 09:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:56 marostegui: Deploy T266486 T268392 T273360 on db1123
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16655 and previous config saved to /var/cache/conftool/dbconfig/20210621-085538-root.json
  • 08:31 dcausse: depooling wdqs1005 (lag)
  • 07:47 moritzm: updated buster d-i image for Buster 10.10 point release (which included ABI bump for Linux kernel)
  • 07:44 jayme: started debian-weekly-rebuild.service on deneb (it failed due to 404 on snapshots.debian.org yesterday)
  • 06:49 moritzm: installing libwebp security updates on buster
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16654 and previous config saved to /var/cache/conftool/dbconfig/20210621-062156-root.json
  • 06:20 marostegui: Re-add rev_page_id to db1135 T163532 T285149
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T163532', diff saved to https://phabricator.wikimedia.org/P16653 and previous config saved to /var/cache/conftool/dbconfig/20210621-062014-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16652 and previous config saved to /var/cache/conftool/dbconfig/20210621-060652-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16651 and previous config saved to /var/cache/conftool/dbconfig/20210621-055149-root.json
  • 05:50 kart_: cxserver: Added support for Elia MT + Updated to 2021-06-10-074331-production (T276059, T275803, T276246, T283513, T255231, T237028)
  • 05:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16650 and previous config saved to /var/cache/conftool/dbconfig/20210621-053645-root.json
  • 05:33 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:31 kormat: stopping replication on db1123 T283131
  • 05:25 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:11 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1123 until it's reimaged to buster T284648', diff saved to https://phabricator.wikimedia.org/P16649 and previous config saved to /var/cache/conftool/dbconfig/20210621-051149-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 master and set section read-write T284648', diff saved to https://phabricator.wikimedia.org/P16648 and previous config saved to /var/cache/conftool/dbconfig/20210621-050506-kormat.json
  • 05:03 kormat@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T284648', diff saved to https://phabricator.wikimedia.org/P16647 and previous config saved to /var/cache/conftool/dbconfig/20210621-050304-kormat.json
  • 05:02 kormat: Starting s3 eqiad failover from db1123 to db1157 - T284648
  • 04:49 kormat@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 T284648', diff saved to https://phabricator.wikimedia.org/P16646 and previous config saved to /var/cache/conftool/dbconfig/20210621-044955-kormat.json
  • 04:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:40 marostegui: Re-add rev_page_id to db1099:3311 T163532 T285149
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T163532', diff saved to https://phabricator.wikimedia.org/P16645 and previous config saved to /var/cache/conftool/dbconfig/20210621-043941-marostegui.json

2021-06-18

  • 20:55 Krinkle: Remove doc1001:/srv/doc/mediawiki-core/wmf-1.36.0-wmf.31-testing
  • 13:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16640 and previous config saved to /var/cache/conftool/dbconfig/20210618-125306-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16639 and previous config saved to /var/cache/conftool/dbconfig/20210618-123802-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16638 and previous config saved to /var/cache/conftool/dbconfig/20210618-122526-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16637 and previous config saved to /var/cache/conftool/dbconfig/20210618-122259-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16636 and previous config saved to /var/cache/conftool/dbconfig/20210618-121022-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16635 and previous config saved to /var/cache/conftool/dbconfig/20210618-120755-root.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16634 and previous config saved to /var/cache/conftool/dbconfig/20210618-115518-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16633 and previous config saved to /var/cache/conftool/dbconfig/20210618-114015-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16631 and previous config saved to /var/cache/conftool/dbconfig/20210618-112739-marostegui.json
  • 09:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:49 XioNoX: eqsin-codfw link re-enabled but drained
  • 08:39 legoktm: finished adding shellbox LVS entry, https://shellbox.svc.eqiad.wmnet:4008/ and https://shellbox.svc.codfw.wmnet:4008/ now work (T281423)
  • 08:30 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16630 and previous config saved to /var/cache/conftool/dbconfig/20210618-081737-root.json
  • 08:06 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16629 and previous config saved to /var/cache/conftool/dbconfig/20210618-080233-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16628 and previous config saved to /var/cache/conftool/dbconfig/20210618-074729-root.json
  • 07:44 legoktm: restarting pybal on lvs1015, lvs2009 (active) - T281423
  • 07:35 legoktm: restarting pyball on lvs1016, lvs2010 to add shellbox
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16627 and previous config saved to /var/cache/conftool/dbconfig/20210618-073225-root.json
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2010.codfw.wmnet
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2010.codfw.wmnet
  • 06:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16626 and previous config saved to /var/cache/conftool/dbconfig/20210618-063632-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16625 and previous config saved to /var/cache/conftool/dbconfig/20210618-062452-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16624 and previous config saved to /var/cache/conftool/dbconfig/20210618-062129-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16623 and previous config saved to /var/cache/conftool/dbconfig/20210618-060625-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16622 and previous config saved to /var/cache/conftool/dbconfig/20210618-060452-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16621 and previous config saved to /var/cache/conftool/dbconfig/20210618-055122-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16620 and previous config saved to /var/cache/conftool/dbconfig/20210618-054949-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16619 and previous config saved to /var/cache/conftool/dbconfig/20210618-054841-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16618 and previous config saved to /var/cache/conftool/dbconfig/20210618-054659-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16617 and previous config saved to /var/cache/conftool/dbconfig/20210618-053445-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16616 and previous config saved to /var/cache/conftool/dbconfig/20210618-053156-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16615 and previous config saved to /var/cache/conftool/dbconfig/20210618-051942-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16614 and previous config saved to /var/cache/conftool/dbconfig/20210618-051712-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16613 and previous config saved to /var/cache/conftool/dbconfig/20210618-051652-root.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16612 and previous config saved to /var/cache/conftool/dbconfig/20210618-050148-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16611 and previous config saved to /var/cache/conftool/dbconfig/20210618-045808-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16610 and previous config saved to /var/cache/conftool/dbconfig/20210618-045743-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16609 and previous config saved to /var/cache/conftool/dbconfig/20210618-045355-marostegui.json

2021-06-17

  • 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids (T280731)
  • 18:24 ryankemper: T285106 [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
  • 18:01 dancy: Deployed latest scap code to beta cluster
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: client: Bring back using the client setting for langlink group (T284854) (duration: 00m 58s)
  • 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previous config saved to /var/cache/conftool/dbconfig/20210617-093438-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16581 and previous config saved to /var/cache/conftool/dbconfig/20210617-092056-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16580 and previous config saved to /var/cache/conftool/dbconfig/20210617-091934-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P16579 and previous config saved to /var/cache/conftool/dbconfig/20210617-090947-marostegui.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16578 and previous config saved to /var/cache/conftool/dbconfig/20210617-090552-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16577 and previous config saved to /var/cache/conftool/dbconfig/20210617-085048-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16576 and previous config saved to /var/cache/conftool/dbconfig/20210617-084941-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16575 and previous config saved to /var/cache/conftool/dbconfig/20210617-083545-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16574 and previous config saved to /var/cache/conftool/dbconfig/20210617-083438-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16573 and previous config saved to /var/cache/conftool/dbconfig/20210617-083005-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16572 and previous config saved to /var/cache/conftool/dbconfig/20210617-082939-marostegui.json
  • 08:28 elukey: upload istioctl 1.6.14-1 to buster-wikimedia
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16571 and previous config saved to /var/cache/conftool/dbconfig/20210617-082437-root.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16570 and previous config saved to /var/cache/conftool/dbconfig/20210617-082409-marostegui.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16569 and previous config saved to /var/cache/conftool/dbconfig/20210617-081934-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16568 and previous config saved to /var/cache/conftool/dbconfig/20210617-080933-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16567 and previous config saved to /var/cache/conftool/dbconfig/20210617-080430-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16566 and previous config saved to /var/cache/conftool/dbconfig/20210617-075825-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16565 and previous config saved to /var/cache/conftool/dbconfig/20210617-075429-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16564 and previous config saved to /var/cache/conftool/dbconfig/20210617-073926-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16563 and previous config saved to /var/cache/conftool/dbconfig/20210617-073305-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16562 and previous config saved to /var/cache/conftool/dbconfig/20210617-073229-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16561 and previous config saved to /var/cache/conftool/dbconfig/20210617-071726-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16560 and previous config saved to /var/cache/conftool/dbconfig/20210617-070222-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16559 and previous config saved to /var/cache/conftool/dbconfig/20210617-064717-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16558 and previous config saved to /var/cache/conftool/dbconfig/20210617-063135-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16557 and previous config saved to /var/cache/conftool/dbconfig/20210617-062514-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16556 and previous config saved to /var/cache/conftool/dbconfig/20210617-061010-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16555 and previous config saved to /var/cache/conftool/dbconfig/20210617-055507-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16554 and previous config saved to /var/cache/conftool/dbconfig/20210617-054003-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16553 and previous config saved to /var/cache/conftool/dbconfig/20210617-053455-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16552 and previous config saved to /var/cache/conftool/dbconfig/20210617-053105-root.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16551 and previous config saved to /var/cache/conftool/dbconfig/20210617-051601-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16550 and previous config saved to /var/cache/conftool/dbconfig/20210617-050057-root.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16549 and previous config saved to /var/cache/conftool/dbconfig/20210617-044554-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16548 and previous config saved to /var/cache/conftool/dbconfig/20210617-044146-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16547 and previous config saved to /var/cache/conftool/dbconfig/20210617-044132-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16546 and previous config saved to /var/cache/conftool/dbconfig/20210617-043130-marostegui.json

2021-06-16

  • 21:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 21:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:41 dancy: Reverted Scap release on beta
  • 16:18 topranks: Resetting metric on Telia CCT IC-331929, cr1-codfw and cr3-eqsin.
  • 15:22 dancy: testing upcoming Scap release on beta
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16545 and previous config saved to /var/cache/conftool/dbconfig/20210616-125329-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16544 and previous config saved to /var/cache/conftool/dbconfig/20210616-123826-root.json
  • 12:34 kormat: deploying heartbeat service puppet change
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16543 and previous config saved to /var/cache/conftool/dbconfig/20210616-122322-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16541 and previous config saved to /var/cache/conftool/dbconfig/20210616-120818-root.json
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16540 and previous config saved to /var/cache/conftool/dbconfig/20210616-120015-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16539 and previous config saved to /var/cache/conftool/dbconfig/20210616-112115-root.json
  • 11:20 hnowlan: running `nodetool cleanup` on maps1005
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16538 and previous config saved to /var/cache/conftool/dbconfig/20210616-110612-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16537 and previous config saved to /var/cache/conftool/dbconfig/20210616-105108-root.json
  • 10:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16536 and previous config saved to /var/cache/conftool/dbconfig/20210616-103604-root.json
  • 10:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16535 and previous config saved to /var/cache/conftool/dbconfig/20210616-102349-marostegui.json
  • 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
  • 09:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 09:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 09:50 hnowlan: disabling puppet on maps1* to reparent maps1007 from new master maps1009
  • 09:47 kormat: truncating all pc* tables on pc1010 T282761
  • 09:40 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary T282761 (duration: 00m 59s)
  • 09:04 kormat: Deploying wmfmariadbpy 0.7.1 T284819
  • 09:04 kormat: uploaded wmfmariadbpy 0.7.1 to apt.wm.o
  • 08:24 Amir1: running "update flaggedrevs set fr_quality = 0 where fr_quality != 0;" on all wikis where flagged revs is enabled (T279761)
  • 07:27 dcausse: cleanup old /var/log/airflow/scheduler logs to reclaim space on an-airflow1001
  • 06:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 05:06 marostegui: Upgrade clouddb1014

2021-06-15

  • 17:54 dancy: testing upcoming Scap release on beta
  • 17:21 mutante: new Wikimedia language "shi" added - Shilha /ˈʃɪlhə/ is a Berber language native to Shilha people. The endonym is Taclḥit /taʃlʜijt/, and in recent English publications the language is often rendered Tashelhiyt or Tashelhit.
  • 17:17 mutante: new Wikimedia language "dag" added - Dagbani (or Dagbane), also known as Dagbanli and Dagbanle, is a Gur language spoken in Ghana.
  • 17:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
  • 17:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
  • 16:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
  • 16:11 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
  • 14:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 XioNoX: re-enable cr1-codfw:xe-5/1/2
  • 13:23 marostegui: Upgrade clouddb1018
  • 13:15 effie: enable puppet on canaries
  • 13:10 effie: disable puppet on canaries to deploy 699908
  • 10:45 XioNoX: re-enable cr1-codfw:xe-5/1/2
  • 09:42 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json
  • 06:10 XioNoX: roll OSPF link-protection to all routers - T167306
  • 02:30 eileen: civicrm revision changed from d9d61dad0b to acbcce94a2, config revision is 2aed6ff89b
  • 01:22 eileen: civicrm revision changed from 28ace1b86f to d9d61dad0b, config revision is 2aed6ff89b
  • 00:37 eileen: civicrm revision changed from 31d07115a0 to 28ace1b86f, config revision is 2aed6ff89b

2021-06-14

  • 21:40 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@baeee47]: T261407 bulk_daemon: Deploy prioritized topics (duration: 00m 49s)
  • 21:40 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@baeee47]: T261407 bulk_daemon: Deploy prioritized topics
  • 19:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1003.eqiad.wmnet
  • 19:21 twentyafterfour_: applying hotfix for T284397 and restarting php7.3-fpm on phab1001
  • 18:30 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1003.eqiad.wmnet
  • 17:05 jforrester@deploy1002: Finished deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915 (duration: 00m 07s)
  • 17:05 jforrester@deploy1002: Started deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915
  • 16:46 jforrester@deploy1002: Finished deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915 (duration: 00m 07s)
  • 16:46 jforrester@deploy1002: Started deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915
  • 15:56 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1002.eqiad.wmnet
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16521 and previous config saved to /var/cache/conftool/dbconfig/20210614-155258-root.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16520 and previous config saved to /var/cache/conftool/dbconfig/20210614-153754-root.json
  • 15:24 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16519 and previous config saved to /var/cache/conftool/dbconfig/20210614-152250-root.json
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1005.eqiad.wmnet
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16518 and previous config saved to /var/cache/conftool/dbconfig/20210614-150747-root.json
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1005.eqiad.wmnet
  • 15:04 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1002.eqiad.wmnet
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1004.eqiad.wmnet
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1004.eqiad.wmnet
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16517 and previous config saved to /var/cache/conftool/dbconfig/20210614-145243-root.json
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1003.eqiad.wmnet
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16516 and previous config saved to /var/cache/conftool/dbconfig/20210614-145039-root.json
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1003.eqiad.wmnet
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16515 and previous config saved to /var/cache/conftool/dbconfig/20210614-144130-marostegui.json
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1002.eqiad.wmnet
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16514 and previous config saved to /var/cache/conftool/dbconfig/20210614-143536-root.json
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1002.eqiad.wmnet
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16513 and previous config saved to /var/cache/conftool/dbconfig/20210614-143224-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16512 and previous config saved to /var/cache/conftool/dbconfig/20210614-143211-root.json
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1001.eqiad.wmnet
  • 14:27 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice{BannerHistory,Impression} to EventGate on all wikis - T271168 (duration: 00m 57s)
  • 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1001.eqiad.wmnet
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2007.codfw.wmnet
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16511 and previous config saved to /var/cache/conftool/dbconfig/20210614-142032-root.json
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16510 and previous config saved to /var/cache/conftool/dbconfig/20210614-142014-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16509 and previous config saved to /var/cache/conftool/dbconfig/20210614-141720-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16508 and previous config saved to /var/cache/conftool/dbconfig/20210614-141707-root.json
  • 14:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice{BannerHistory,Impression} to EventGate on testwiki - T271168 (duration: 00m 57s)
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2007.codfw.wmnet
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2006.codfw.wmnet
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16507 and previous config saved to /var/cache/conftool/dbconfig/20210614-140529-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16506 and previous config saved to /var/cache/conftool/dbconfig/20210614-140511-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16505 and previous config saved to /var/cache/conftool/dbconfig/20210614-140217-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16504 and previous config saved to /var/cache/conftool/dbconfig/20210614-140203-root.json
  • 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2006.codfw.wmnet
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16503 and previous config saved to /var/cache/conftool/dbconfig/20210614-135456-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16502 and previous config saved to /var/cache/conftool/dbconfig/20210614-135025-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16501 and previous config saved to /var/cache/conftool/dbconfig/20210614-135007-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16500 and previous config saved to /var/cache/conftool/dbconfig/20210614-134713-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16499 and previous config saved to /var/cache/conftool/dbconfig/20210614-134700-root.json
  • 13:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16498 and previous config saved to /var/cache/conftool/dbconfig/20210614-133953-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16497 and previous config saved to /var/cache/conftool/dbconfig/20210614-133801-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16496 and previous config saved to /var/cache/conftool/dbconfig/20210614-133503-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16495 and previous config saved to /var/cache/conftool/dbconfig/20210614-133442-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16494 and previous config saved to /var/cache/conftool/dbconfig/20210614-133210-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16493 and previous config saved to /var/cache/conftool/dbconfig/20210614-133156-root.json
  • 13:29 effie: restart memcached on codfw
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16492 and previous config saved to /var/cache/conftool/dbconfig/20210614-132449-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312 db1170:3317 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16491 and previous config saved to /var/cache/conftool/dbconfig/20210614-132235-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16490 and previous config saved to /var/cache/conftool/dbconfig/20210614-132000-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16489 and previous config saved to /var/cache/conftool/dbconfig/20210614-131938-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16488 and previous config saved to /var/cache/conftool/dbconfig/20210614-130946-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16487 and previous config saved to /var/cache/conftool/dbconfig/20210614-130723-marostegui.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16486 and previous config saved to /var/cache/conftool/dbconfig/20210614-130547-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16485 and previous config saved to /var/cache/conftool/dbconfig/20210614-130435-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16484 and previous config saved to /var/cache/conftool/dbconfig/20210614-125442-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16483 and previous config saved to /var/cache/conftool/dbconfig/20210614-125043-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16482 and previous config saved to /var/cache/conftool/dbconfig/20210614-124931-root.json
  • 12:37 XioNoX: configure OSPF link-protection on cr3/4-ulsfo - T167306
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16481 and previous config saved to /var/cache/conftool/dbconfig/20210614-123539-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16480 and previous config saved to /var/cache/conftool/dbconfig/20210614-123512-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16479 and previous config saved to /var/cache/conftool/dbconfig/20210614-123427-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1028 original weight', diff saved to https://phabricator.wikimedia.org/P16478 and previous config saved to /var/cache/conftool/dbconfig/20210614-122322-marostegui.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to es1028 while es1034 gets upgraded', diff saved to https://phabricator.wikimedia.org/P16477 and previous config saved to /var/cache/conftool/dbconfig/20210614-122242-marostegui.json
  • 12:22 dcausse: re-pooling wdqs1012
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16476 and previous config saved to /var/cache/conftool/dbconfig/20210614-122212-marostegui.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16475 and previous config saved to /var/cache/conftool/dbconfig/20210614-122036-root.json
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2005.codfw.wmnet
  • 12:17 XioNoX: configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - T167306
  • 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2005.codfw.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P16474 and previous config saved to /var/cache/conftool/dbconfig/20210614-121101-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16473 and previous config saved to /var/cache/conftool/dbconfig/20210614-121031-marostegui.json
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2004.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2004.codfw.wmnet
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16472 and previous config saved to /var/cache/conftool/dbconfig/20210614-120112-marostegui.json
  • 11:28 effie: restart memcached on mc2019
  • 11:09 effie: restart memcached on codfw memcached gutter pool (mc-gp2* hosts)
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2003.codfw.wmnet
  • 10:52 topranks: T283163: Adding "metric-out minimum-igp" to all internal/Confed BGP groups on CR routers.
  • 10:46 effie: enable puppet on mc*
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2003.codfw.wmnet
  • 10:39 effie: disable puppet on mc* hosts
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2001.codfw.wmnet
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2001.codfw.wmnet
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16471 and previous config saved to /var/cache/conftool/dbconfig/20210614-101839-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16469 and previous config saved to /var/cache/conftool/dbconfig/20210614-100336-root.json
  • 09:56 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 (duration: 02m 37s)
  • 09:54 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16467 and previous config saved to /var/cache/conftool/dbconfig/20210614-094832-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16466 and previous config saved to /var/cache/conftool/dbconfig/20210614-093329-root.json
  • 09:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P16465 and previous config saved to /var/cache/conftool/dbconfig/20210614-092234-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16464 and previous config saved to /var/cache/conftool/dbconfig/20210614-092125-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16463 and previous config saved to /var/cache/conftool/dbconfig/20210614-090622-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16462 and previous config saved to /var/cache/conftool/dbconfig/20210614-085118-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16461 and previous config saved to /var/cache/conftool/dbconfig/20210614-083614-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P16460 and previous config saved to /var/cache/conftool/dbconfig/20210614-081239-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16459 and previous config saved to /var/cache/conftool/dbconfig/20210614-081031-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P16458 and previous config saved to /var/cache/conftool/dbconfig/20210614-080552-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16456 and previous config saved to /var/cache/conftool/dbconfig/20210614-075528-root.json
  • 07:51 marostegui: Depool clouddb1013 to upgrade mysql
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16455 and previous config saved to /var/cache/conftool/dbconfig/20210614-074024-root.json
  • 07:30 marostegui: Reboot db2148 T284852
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 T284852', diff saved to https://phabricator.wikimedia.org/P16454 and previous config saved to /var/cache/conftool/dbconfig/20210614-072930-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16453 and previous config saved to /var/cache/conftool/dbconfig/20210614-072520-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P16452 and previous config saved to /var/cache/conftool/dbconfig/20210614-071839-marostegui.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16451 and previous config saved to /var/cache/conftool/dbconfig/20210614-071742-root.json
  • 07:15 dcausse: restart blazegraph and depool wdqs1012
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16450 and previous config saved to /var/cache/conftool/dbconfig/20210614-070238-root.json
  • 07:01 moritzm: restarting mw canaries to pick up libwebp security updates
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16449 and previous config saved to /var/cache/conftool/dbconfig/20210614-064734-root.json
  • 06:39 moritzm: installing libwep security updates on buster
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16448 and previous config saved to /var/cache/conftool/dbconfig/20210614-063231-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P16447 and previous config saved to /var/cache/conftool/dbconfig/20210614-062554-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16446 and previous config saved to /var/cache/conftool/dbconfig/20210614-061226-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16445 and previous config saved to /var/cache/conftool/dbconfig/20210614-060119-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16444 and previous config saved to /var/cache/conftool/dbconfig/20210614-055723-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16443 and previous config saved to /var/cache/conftool/dbconfig/20210614-054615-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16442 and previous config saved to /var/cache/conftool/dbconfig/20210614-054219-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16441 and previous config saved to /var/cache/conftool/dbconfig/20210614-053112-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16440 and previous config saved to /var/cache/conftool/dbconfig/20210614-052715-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P16439 and previous config saved to /var/cache/conftool/dbconfig/20210614-051930-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16438 and previous config saved to /var/cache/conftool/dbconfig/20210614-051608-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P16437 and previous config saved to /var/cache/conftool/dbconfig/20210614-051522-marostegui.json

2021-06-12

  • 13:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
  • 13:49 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused

2021-06-11

  • 23:37 mutante: removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades
  • 22:08 brennen: gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not.
  • 21:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
  • 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
  • 21:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
  • 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
  • 20:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
  • 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
  • 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
  • 16:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
  • 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
  • 15:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s)
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json
  • 14:44 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s)
  • 14:44 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
  • 14:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s)
  • 14:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
  • 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 14:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 14:35 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:34 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 14:33 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:33 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:32 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:31 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json
  • 14:22 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:22 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:20 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:20 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json
  • 13:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 13:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 13:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json
  • 10:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 07:29 moritzm: restarting archiva to pick up OpenJDK security updates
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
  • 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
  • 06:56 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:56 elukey: rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run
  • 05:47 elukey: run systemctl reset-failed ifup@en5.service on doh1001 - T273026
  • 01:10 eileen: process-control config revision is 2aed6ff89b

2021-06-10

  • 23:29 derick@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Citoid/modules/ve/ve.ui.CitoidInspector.js: Backport: CitoidInspector: rename getParameterNames to getOrderedParameterNames (T284786) (duration: 00m 57s)
  • 21:40 urbanecm: End of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # T282699
  • 21:36 urbanecm: Start of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # T282699
  • 21:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki discussiontools # T282699
  • 20:13 mutante: installed tftp client on install1003 for debugging
  • 20:00 jhuneidi@deploy1002: Pruned MediaWiki: 1.37.0-wmf.5 (duration: 03m 33s)
  • 19:31 ryankemper: T265547 Cleanup following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/698025: `sudo -E cumin -b 5 'P:analytics::cluster::elasticsearch' 'sudo rm -rfv /etc/mjolnir /srv/deployment/search/mjolnir'`
  • 19:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.9 refs T281150
  • 18:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikimediaMaintenance/dumpInterwiki.php: b21904e: Remove sep11 interwiki link from dumpinterwiki.php (duration: 01m 08s)
  • 18:45 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 23s)
  • 18:39 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 03s)
  • 18:38 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: 8aeab13: Fire language change hook (T280770) (duration: 01m 07s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: d26968c: wgWelcomeSurveyExperimentalGroups: Use new syntax in CS.php (T284597; T284735) (duration: 01m 08s)
  • 17:11 moritzm: updating bullseye installer image to latest daily image (kernel ABI changed again) T275873
  • 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:06 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:53 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 16:51 moritzm: installing rails security updates
  • 16:37 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta I2a42c222003 (duration: 01m 07s)
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:24 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:09 papaul: power down ms-be2038 for BBU replacement
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16417 and previous config saved to /var/cache/conftool/dbconfig/20210610-123201-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16416 and previous config saved to /var/cache/conftool/dbconfig/20210610-121657-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16415 and previous config saved to /var/cache/conftool/dbconfig/20210610-120153-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16414 and previous config saved to /var/cache/conftool/dbconfig/20210610-114650-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16413 and previous config saved to /var/cache/conftool/dbconfig/20210610-113146-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16412 and previous config saved to /var/cache/conftool/dbconfig/20210610-111643-root.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16411 and previous config saved to /var/cache/conftool/dbconfig/20210610-110139-root.json
  • 11:00 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 53s)
  • 10:59 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next
  • 10:47 topranks: T283163: Adding "metric-out minimum-igp" to BGP group Confed_eqord on eqiad, codfw and eqdfw CRs.
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16410 and previous config saved to /var/cache/conftool/dbconfig/20210610-104635-root.json
  • 10:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikiEditor/modules/jquery.wikiEditor.js: 8a17c43: Fix call to renamed var (T284716) (duration: 01m 25s)
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16409 and previous config saved to /var/cache/conftool/dbconfig/20210610-103132-root.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16408 and previous config saved to /var/cache/conftool/dbconfig/20210610-103032-marostegui.json
  • 10:29 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 kormat: running optimize tables against pc1009 (pc3) T282761
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16407 and previous config saved to /var/cache/conftool/dbconfig/20210610-101858-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16406 and previous config saved to /var/cache/conftool/dbconfig/20210610-100355-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16405 and previous config saved to /var/cache/conftool/dbconfig/20210610-094851-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16404 and previous config saved to /var/cache/conftool/dbconfig/20210610-093346-root.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16402 and previous config saved to /var/cache/conftool/dbconfig/20210610-093003-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16401 and previous config saved to /var/cache/conftool/dbconfig/20210610-092246-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16399 and previous config saved to /var/cache/conftool/dbconfig/20210610-091842-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16398 and previous config saved to /var/cache/conftool/dbconfig/20210610-090345-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16397 and previous config saved to /var/cache/conftool/dbconfig/20210610-090339-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16396 and previous config saved to /var/cache/conftool/dbconfig/20210610-084841-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16395 and previous config saved to /var/cache/conftool/dbconfig/20210610-084835-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16394 and previous config saved to /var/cache/conftool/dbconfig/20210610-083338-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16393 and previous config saved to /var/cache/conftool/dbconfig/20210610-083332-root.json
  • 08:25 volans: uploaded spicerack_0.0.53 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16392 and previous config saved to /var/cache/conftool/dbconfig/20210610-081834-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16391 and previous config saved to /var/cache/conftool/dbconfig/20210610-081828-root.json
  • 08:17 marostegui: Drop several grants from labswiki (wikitech) T282074
  • 07:57 jynus: reset-failed on cumin1001 after backup rerun
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P16389 and previous config saved to /var/cache/conftool/dbconfig/20210610-075702-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16388 and previous config saved to /var/cache/conftool/dbconfig/20210610-075247-marostegui.json
  • 07:44 jynus: retrying s6 snapshots on eqiad, acking demon failure
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16387 and previous config saved to /var/cache/conftool/dbconfig/20210610-073727-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16386 and previous config saved to /var/cache/conftool/dbconfig/20210610-072224-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16385 and previous config saved to /var/cache/conftool/dbconfig/20210610-070720-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16384 and previous config saved to /var/cache/conftool/dbconfig/20210610-065217-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16383 and previous config saved to /var/cache/conftool/dbconfig/20210610-064916-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16382 and previous config saved to /var/cache/conftool/dbconfig/20210610-063745-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16381 and previous config saved to /var/cache/conftool/dbconfig/20210610-063412-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16380 and previous config saved to /var/cache/conftool/dbconfig/20210610-061909-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16379 and previous config saved to /var/cache/conftool/dbconfig/20210610-061806-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16378 and previous config saved to /var/cache/conftool/dbconfig/20210610-060405-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16377 and previous config saved to /var/cache/conftool/dbconfig/20210610-060302-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16376 and previous config saved to /var/cache/conftool/dbconfig/20210610-055327-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16375 and previous config saved to /var/cache/conftool/dbconfig/20210610-055037-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16374 and previous config saved to /var/cache/conftool/dbconfig/20210610-054802-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16373 and previous config saved to /var/cache/conftool/dbconfig/20210610-054759-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16372 and previous config saved to /var/cache/conftool/dbconfig/20210610-053534-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16371 and previous config saved to /var/cache/conftool/dbconfig/20210610-053259-root.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16370 and previous config saved to /var/cache/conftool/dbconfig/20210610-053255-root.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16369 and previous config saved to /var/cache/conftool/dbconfig/20210610-052421-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16368 and previous config saved to /var/cache/conftool/dbconfig/20210610-052030-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16367 and previous config saved to /var/cache/conftool/dbconfig/20210610-052017-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16366 and previous config saved to /var/cache/conftool/dbconfig/20210610-050526-root.json

2021-06-09

  • 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1002.wikimedia.org
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
  • 21:59 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh1002.wikimedia.org
  • 21:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
  • 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1001.wikimedia.org
  • 21:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1001.wikimedia.org
  • 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/DiscussionTools/modules/dt-ve/CommentTargetWidget.less: Backport: Update surface styles for VE changes (T284567) (duration: 01m 14s)
  • 21:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/includes/language/LanguageConverter.php: Backport: Revert "Add type hint to constructor of LanguageConverter" (T284685) (duration: 01m 24s)
  • 21:08 mutante: rsyncing static-bugzilla HTML from miscweb1002 to deploy1002
  • 21:00 mutante: deploy1002 - creating temp dir /srv/miscweb to rsync static-bugzilla data to, coming from miscweb1002 T281538
  • 20:36 mutante: deployed temp ferm change on deployment servers to let miscweb dump data, puppetized. scap pull from mwdebug1001 works, deployment good to go
  • 19:08 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.9 refs T281150 (duration: 01m 07s)
  • 19:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.9 refs T281150
  • 18:07 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php (foreachwiki)
  • 17:52 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php --wiki rmywiki
  • 17:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudmetrics1002.eqiad.wmnet
  • 17:32 aborrero@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudmetrics1002.eqiad.wmnet
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 17:16 jayme: updated python3-docker-report to 0.0.12 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,deneb.codfw.wmnet,registry[2003-2008].codfw.wmnet,registry[1003-1004].eqiad.wmnet
  • 16:35 jayme: import docker-report 0.0.12 into buster-wikimedia
  • 15:37 hnowlan: rebuilding maps2009 as buster master
  • 15:08 vgutierrez: restarting acme-chief on acmechief1001
  • 15:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 15:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 15:01 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 55s)
  • 15:00 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:57 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 04s)
  • 14:57 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:51 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 15s)
  • 14:50 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:45 moritzm: installing postgresql 9.6 security updates on stretch
  • 14:37 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - T282562 (duration: 01m 06s)
  • 14:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on all wikis - T282855 (duration: 01m 06s)
  • 14:23 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on testwiki - T282855 (duration: 01m 07s)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16358 and previous config saved to /var/cache/conftool/dbconfig/20210609-141807-root.json
  • 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=0; selector: name=maps2009.codfw.wmnet
  • 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
  • 13:59 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - T282562 (duration: 01m 08s)
  • 13:56 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki1001 - T282469
  • 13:54 XioNoX: Add Routinator 3000 0.9.0 to the APT repo - T282469
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16356 and previous config saved to /var/cache/conftool/dbconfig/20210609-134800-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16355 and previous config saved to /var/cache/conftool/dbconfig/20210609-133257-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16354 and previous config saved to /var/cache/conftool/dbconfig/20210609-132958-marostegui.json
  • 13:12 moritzm: installing nginx security updates
  • 13:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 02m 26s)
  • 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 00m 10s)
  • 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 01m 14s)
  • 13:05 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16351 and previous config saved to /var/cache/conftool/dbconfig/20210609-130114-root.json
  • 12:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
  • 12:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1 (duration: 00m 53s)
  • 12:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16350 and previous config saved to /var/cache/conftool/dbconfig/20210609-124610-root.json
  • 12:43 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 28s)
  • 12:42 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:42 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 08s)
  • 12:41 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:41 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 47s)
  • 12:40 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:39 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 41s)
  • 12:39 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16349 and previous config saved to /var/cache/conftool/dbconfig/20210609-123615-root.json
  • 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
  • 12:33 godog: lists1001:rm /var/lib/prometheus/node.d/mailman_queues.prom
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16348 and previous config saved to /var/cache/conftool/dbconfig/20210609-123106-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16347 and previous config saved to /var/cache/conftool/dbconfig/20210609-122111-root.json
  • 12:18 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 03m 38s)
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16345 and previous config saved to /var/cache/conftool/dbconfig/20210609-121603-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16344 and previous config saved to /var/cache/conftool/dbconfig/20210609-121501-marostegui.json
  • 12:14 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:13 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 53s)
  • 12:12 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 44s)
  • 12:09 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 12:09 hnowlan: running `nodetool decommission` on maps2009
  • 12:06 hnowlan: stopped tilerator on maps2009
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16343 and previous config saved to /var/cache/conftool/dbconfig/20210609-120608-root.json
  • 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
  • 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
  • 12:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 12:03 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 06s)
  • 12:03 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 12:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ac43baa: d185728: WelcomeSurveyExperimentalGroups: Use new syntax (T284599) (duration: 01m 19s)
  • 11:59 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 54s)
  • 11:58 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:54 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 41s)
  • 11:54 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:53 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 03m 11s)
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16342 and previous config saved to /var/cache/conftool/dbconfig/20210609-115104-root.json
  • 11:50 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:49 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 02m 16s)
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16341 and previous config saved to /var/cache/conftool/dbconfig/20210609-114944-marostegui.json
  • 11:47 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 05s)
  • 11:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:46 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 53s)
  • 11:45 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: redeploy HEAD~1 (duration: 01m 55s)
  • 11:38 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: redeploy HEAD~1
  • 11:36 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1 (duration: 00m 54s)
  • 11:35 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1
  • 11:34 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 02m 23s)
  • 11:32 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
  • 11:32 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 00m 59s)
  • 11:31 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
  • 11:27 jbond: drop keep_env from sudo config - #T275852
  • 11:22 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 43s)
  • 11:22 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 11:21 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 15s)
  • 11:20 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 11:11 awight: EU deployment window complete
  • 11:10 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set wgAutoConfirmCount to 10 for enwikisource (T284627) (duration: 02m 04s)
  • 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
  • 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
  • 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 53s)
  • 10:14 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 05m 41s)
  • 10:07 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:06 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 38s)
  • 10:06 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T283235', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json
  • 10:00 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 48s)
  • 09:59 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 09:58 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on schema* after switch towards nginx-light T164456
  • 07:54 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:16 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:26 XioNoX: Add 185.71.138.0/24 to network::external and diffscan - T252132
  • 06:12 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16334 and previous config saved to /var/cache/conftool/dbconfig/20210609-053213-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16333 and previous config saved to /var/cache/conftool/dbconfig/20210609-051710-root.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16332 and previous config saved to /var/cache/conftool/dbconfig/20210609-050206-root.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16331 and previous config saved to /var/cache/conftool/dbconfig/20210609-044703-root.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index T163532', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json
  • 04:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:30 eileen: civicrm revision changed from eac772e9c9 to 31d07115a0, config revision is 931a941a5e
  • 03:01 Amir1: mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T284444)
  • 02:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:56 Amir1: clean up of the rest of mbox files (except arbcom) (T282303)
  • 02:55 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 02:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "xfer categories following reimage" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 02:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:39 ryankemper: T280382 Re-enabled puppet on `wdqs1010`
  • 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikisource OCR on select Wikisources (T283898) (duration: 01m 31s)
  • 00:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-06-08

  • 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
  • 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
  • 22:21 ryankemper: T284479 Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
  • 22:15 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
  • 22:14 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
  • 22:10 ryankemper: T284479 Yup more than enough evidence of a strong upward spike now. Proceeding to revert
  • 22:10 ryankemper: T284479 Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
  • 22:09 ryankemper: T284479 Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
  • 22:03 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
  • 22:01 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
  • 21:59 ryankemper: T284479 Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
  • 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:29 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 21:27 ryankemper: T280382 Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
  • 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
  • 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
  • 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
  • 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9 refs T281150
  • 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
  • 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:36 ryankemper: T280382 Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
  • 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: Fix MonoBook orange banner hover styles (T284496) (duration: 01m 08s)
  • 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
  • 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
  • 19:20 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:18 ryankemper: T280382 `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
  • 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
  • 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
  • 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
  • 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr{1,2}-eqiad
  • 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9 refs T281150 (duration: 34m 12s)
  • 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9 refs T281150
  • 16:27 papaul: powerdown moss-fe2002 for relocation
  • 16:06 papaul: powerdown ms-backup2002 for relocation
  • 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:40 papaul: powerdown ms-be2061 for relocation
  • 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 15:33 papaul: powerdown thanos-fe2003 for relocation
  • 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, T280605, T282761.
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:13 papaul: powerdown cp2034 for relocation
  • 15:04 papaul: powerdown cp2033 for relocation
  • 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light T164456
  • 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters T284106
  • 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master T282761 (duration: 00m 57s)
  • 14:05 kormat: setting pc1010 as pc3 primary T282761
  • 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 42s)
  • 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 47s)
  • 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 03s)
  • 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master T282761 (duration: 00m 57s)
  • 12:14 kormat: setting pc1008 back as pc2 primary T282761
  • 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef49422: enwiki: Disable indexing on the Book namespace (T283522) (duration: 00m 56s)
  • 11:4