You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix T259167 (duration: 01m 03s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(314 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-08-11 ==
== 2021-08-03 ==
* 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix [[phab:T259167|T259167]] (duration: 01m 03s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix [[phab:T259167|T259167]]
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins ([[phab:T247652|T247652]])
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance ([[phab:T247652|T247652]])
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-08-10 ==
== 2021-08-02 ==
* 23:56 eileen: tools revision changed from {{Gerrit|22550f38c5}}
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
 
== 2021-07-31 ==
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
 
== 2021-07-30 ==
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00
* 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:12 volans@
* 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
* 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:33 moritzm: installing graphviz security updates on buster
* 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
* 15:30 _joe_: test
* 15:23 moritzm: installing graphviz security updates on buster
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
* 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
* 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
* 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - [[phab:T283196|T283196]] (duration: 01m 05s)
* 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames [[phab:T283167|T283167]] (duration: 01m 05s)
* 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
* 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
* 12:30 kormat: Deploying wmfmariadbpy 0.7 [[phab:T283228|T283228]]
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (


== 2020-08-09 ==
== 2021-05-08 ==
* 21:58 ejegg: updated payments-wiki from {{Gerrit|cd012f37f1}} to {{Gerrit|932aacde54}}
* 17:18 Amir1: starting upgrade of batch G of mailing lists ([[phab:T280322|T280322]])
* 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== 2020-08-08 ==
== 2021-05-07 ==
* 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:40 legoktm: deleted education@ from MM3, didn't import properly
* 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:33 legoktm: fixed owner for wdqs-gui-build list
* 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
* 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 18:23 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
* 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: [[gerrit:685901{{!}}LinkBatch: skip bad input (T282180 T282070)]] (duration: 01m 06s)
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
* 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
* 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
* 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
* 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
* 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
* 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
* 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
* 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 13:04 Urbanecm: Start server-side upload for 1 video file ([[phab:T281927|T281927]])
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
* 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
* 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
* 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
* 09:55 dcausse: depooling wdqs1012 [[phab:T280382|T280382]], [[phab:T282222|T282222]]
* 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - [[phab:T281673|T281673]]
* 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - [[phab:T281673|T281673]]
* 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
* 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 10s)
* 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 06s)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T282093|T282093]]', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
* 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json


== 2020-08-07 ==
== 2021-05-06 ==
* 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: [[phab:T259855|T259855]] Revert new reply API (duration: 01m 06s)
* 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 ([[phab:T282193|T282193]])
* 15:01 volans: import DNS names for network devices in Netbox - [[phab:T258729|T258729]]
* 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 ([[phab:T282092|T282092]])
* 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
* 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: [[gerrit:685890{{!}}Reorder tables in SpecialWatchlist (T282181)]] (duration: 00m 57s)
* 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 ([[phab:T282092|T282092]])
* 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o ([[phab:T282092|T282092]])
* 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
* 21:11 hashar: restarted CI Jenkins due to [[phab:T281737|T281737]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
* 19:04 ejegg: updated fundraising CiviCRM from {{Gerrit|8034e47008}} to {{Gerrit|2052d79248}}
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
* 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685906{{!}}Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140)]] (duration: 01m 04s)
* 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
* 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|338d1df5903cdc963b9eef22ec2c1750b7b3a02b}}: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 05s)
* 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7e21cf0d96541d0ab5cb18cd7741756ab1dfe7b8}}: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 04s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
* 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - [[phab:T282140|T282140]] (duration: 01m 06s)
* 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
* 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json
* 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
* 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:15 volans: upgrade spicerack on cumin* to 0.0.52
* 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 17:13 papaul: powerdown ms-be2057 for relocation
* 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 17:00 papaul: powerdown elastic2058 for relocation
* 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - [[phab:T281673|T281673]]
* 16:12 papaul: powerdown mc-gp2002 for relocation
* 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 15:58 Amir1: starting upgrade of public mailing lists in group d and e ([[phab:T280322|T280322]])
* 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:42 papaul: powerdown logstash2027 for relocation
* 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
* 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:26 ryankemper: [[phab:T280382|T280382]] [WDQS] Pooled `wdqs1007` and `wdqs2004`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:14 papaul: powerdown ms-be2053 for relocation
* 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 14:55 papaul: powerdown kafka-main2002 for relocation
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
* 13:21 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
* 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
* 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 01m 06s)
* 11:34 mlitn@deploy1002: sync-file aborted: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 00m 56s)
* 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
* 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
* 11:12 kormat: reimaging db1173 to buster [[phab:T280751|T280751]]
* 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
* 10:19 jynus: stop dbprov2002 in advance of maintenance [[phab:T281135|T281135]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
* 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
* 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
* 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye [[phab:T275873|T275873]]
* 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
* 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
* 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
* 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
* 07:47 jynus: shutting down and removing db2098:s3 instance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
* 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
* 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - [[phab:T281673|T281673]]
* 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 07:24 moritzm: installing exim security updates on bullseye hosts
* 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
* 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
* 06:01 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 [[phab:T281445|T281445]]', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
* 05:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing [[phab:T282070|T282070]]  RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
* 05:27 effie: upgrade scap to 3.17.1-1 - [[phab:T279695|T279695]]
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
* 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": null,"_name": null}'`}}
* 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
* 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
* 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
* 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:35 Amir1: sudo service mailman3-web restart


== 2020-08-06 ==
== 2021-05-05 ==
* 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question ([[phab:T232410|T232410]]) (duration: 00m 59s)
* 23:35 ryankemper: [[phab:T281621|T281621]] [[phab:T281327|T281327]] [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
* 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki ([[phab:T253291|T253291]]) (duration: 00m 59s)
* 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: {{Gerrit|4947241f876234aabc578409c3691fb791c8f715}}: Fix centering of as-of label (duration: 01m 08s)
* 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions ([[phab:T281564|T281564]])
* 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:05 mutante: pushing puppet run on all bastion hosts
* 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) [[phab:T281309|T281309]]
* 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|52b134ed84c1c8ef5fcd6927f03567879553d31c}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|6526884848d0bb88c83cec2c6b39461542e21ef6}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 08s)
* 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: {{Gerrit|f189c4627cfc692fb743160030a5e5ab92df1485}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: [[gerrit:618850{{!}}Update git submodules (vendor)]] ([[phab:T259832|T259832]]) (duration: 01m 08s)
* 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 11s)
* 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:29 urbanecm@deploy1002: sync-file aborted: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 00m 04s)
* 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 20:37 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 20:36 ejegg: updated payments-wiki from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:47 shdubsh: restart logstash -- pipeline appears stuck
* 20:20 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|a232fc3438}} to {{Gerrit|d449599540}}
* 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:59 jbond42: re-enable puppet post 685485
* 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
* 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 20:19 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 20:19 brennen: manually updating the vendor submodule on 1.36.0 for [[phab:T259832|T259832]]
* 19:16 jbond42: ignore the last log message will wait for deploy to finish
* 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: [[gerrit:685480{{!}}Fix order of joins in SpecialRecentChanges (T281981)]] (duration: 01m 10s)
* 20:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
* 19:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: [[gerrit:685480{{!}}Fix order of joins in SpecialRecentChanges (T281981)]] (duration: 01m 08s)
* 19:47 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 ([[phab:T280322|T280322]])
* 19:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix another typo in eventgate stream config - [[phab:T251935|T251935]] (duration: 00m 58s)
* 19:01 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): deploying patch for [[phab:T282038|T282038]] and then rolling forward to group1.
* 19:40 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - wgEventStreams - fix typo in eventgate stream config - [[phab:T251935|T251935]] (duration: 00m 59s)
* 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
* 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
* 19:26 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:43 tgr_: Morning deploys done
* 19:04 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.3
* 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:685482{{!}}Prevent edit notices from appearing (T281960)]] (duration: 01m 08s)
* 18:58 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: [[gerrit:685483{{!}}Prevent edit notices from appearing (T281960)]] (duration: 01m 08s)
* 18:57 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:679938{{!}}flaggedrevs.php: Use MediaWikiServices, not an extension function]] (duration: 01m 08s)
* 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: [[gerrit:685478{{!}}Enable Reference Previews for more users (T271206)]] (duration: 01m 08s)
* 18:29 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: [[gerrit:685477{{!}}Enable Reference Previews for more users (T271206)]] (duration: 01m 11s)
* 18:21 Urbanecm: Morning B&C window was completed
* 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:677002{{!}}replace mwlog1001 with new mwlog[12]002 hosts (T224565)]] (duration: 01m 24s)
* 18:20 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/modules/: {{Gerrit|fb4a80830d7d915479e097cc82c681c5fb03d51b}}: Fix "Ask mentor" help panel button styling ([[phab:T250235|T250235]]) (duration: 01m 07s)
* 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
* 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
* 18:11 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9db96595695b5ec1144c078e8961b3c04e8983cf}}: Remove temporary logging for mediamoderation ([[phab:T259742|T259742]]) (duration: 01m 07s)
* 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list ([[phab:T280718|T280718]])
* 18:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9695811a30de30471a81b6ad05aa5e625f52caf1}}: : Enable DiscussionTools as a beta feature on 8 more wikis ("phase 1") ([[phab:T259574|T259574]]) (duration: 01m 06s)
* 17:58 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 17:42 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 06s)
* 17:10 ejegg: updated standalone SmashPig deploy from {{Gerrit|250a8570d1}} to {{Gerrit|be272c02ce}}
* 17:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
* 17:37 brennen: train 1.36.0-wmf.3: proceeding to group1
* 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
* 17:36 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/WikibaseMediaInfo/src/View/MediaInfoEntityTermsView.php: Backport: [[gerrit:618582{{!}}Fix array unpacking as argument list]] ([[phab:T259745|T259745]]) (duration: 01m 07s)
* 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
* 16:32 chrisalbon@deploy1001: Finished deploy [ores/deploy@f3c44be]: [[phab:T258435|T258435]] (duration: 14m 12s)
* 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
* 16:18 dpifke@deploy1001: Finished deploy [performance/arc-lamp@7838c88]: Deploying fixes for [[phab:T259167|T259167]] (duration: 00m 05s)
* 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
* 16:18 dpifke@deploy1001: Started deploy [performance/arc-lamp@7838c88]: Deploying fixes for [[phab:T259167|T259167]]
* 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
* 16:18 chrisalbon@deploy1001: Started deploy [ores/deploy@f3c44be]: [[phab:T258435|T258435]]
* 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
* 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 15:10 herron: decommissioning icinga[12]001 hosts [[phab:T279601|T279601]] [[phab:T279602|T279602]]
* 15:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 [[phab:T280751|T280751]]
* 15:10 fdans@deploy1001: Finished deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3 (duration: 20m 01s)
* 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 [[phab:T280751|T280751]]
* 14:50 fdans@deploy1001: Started deploy [analytics/refinery@97a02a3]: Regular analytics weekly train [analytics/refinery@97a02a3
* 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
* 14:00 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-* test.event streams - [[phab:T251935|T251935]] (duration: 01m 08s)
* 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
* 13:32 jayme: updated helm to 2.16.9-2 on contint*, deploy* and chartmuseum*
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
* 13:24 jayme: imported helm_2.16.9-2 and tiller_2.16.9-2 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
* 12:06 kart_: Updated cxserver to 2020-08-05-070016-production ([[phab:T258919|T258919]], [[phab:T199523|T199523]], [[phab:T257943|T257943]], [[phab:T256194|T256194]])
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
* 12:03 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 11:59 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 11:57 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
* 11:54 Lucas_WMDE: EU backport window done
* 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Flow/: Backport: [[gerrit:618580{{!}}Pass jQuery objects into jqueryMsg]] (duration: 01m 09s)
* 14:18 marostegui: Upgrade kernel and enable report_host on db1126
* 11:53 XioNoX: reboot cr2-eqord - [[phab:T259621|T259621]]
* 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
* 11:37 XioNoX: drain traffic away cr2-eqord - [[phab:T259621|T259621]]
* 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
* 11:27 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/Wikibase/lib/: Backport: [[gerrit:618579{{!}}Fix CachingFallbackLabelDescriptionLookup failing in edge-cases (T259744)]] (duration: 01m 10s)
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
* 11:22 XioNoX: reboot cr2-eqdfw - [[phab:T259621|T259621]]
* 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
* 11:13 XioNoX: drain traffic away cr2-eqdfw - [[phab:T259621|T259621]]
* 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment [[phab:T278723|T278723]] (duration: 16m 47s)
* 10:52 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685062{{!}}Revert "Enable ReferencePreviews on first wikis CommonSettings" ()]] (duration: 02m 08s)
* 10:48 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment [[phab:T278723|T278723]]
* 10:45 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
* 10:23 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
* 10:16 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 10:14 jynus@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 [[phab:T280751|T280751]]
* 10:12 jynus@cumin2001: START - Cookbook sre.hosts.downtime
* 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
* 10:11 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 13:12 kormat: reimaging db2129 to buster [[phab:T280751|T280751]]
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1127', diff saved to https://phabricator.wikimedia.org/P12188 and previous config saved to /var/cache/conftool/dbconfig/20200806-084406-marostegui.json
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12187 and previous config saved to /var/cache/conftool/dbconfig/20200806-083743-marostegui.json
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12186 and previous config saved to /var/cache/conftool/dbconfig/20200806-083033-marostegui.json
* 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P12185 and previous config saved to /var/cache/conftool/dbconfig/20200806-081416-marostegui.json
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
* 07:03 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 12:01 moritzm: installing exim security updates on stretch
* 06:57 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
* 06:57 marostegui: Truncate tables on zerowiki [[phab:T227717|T227717]]
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
* 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
* 06:47 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|3565427dcd80e78352c99eb322de3318ae89a4ee}}: Enable ReferencePreviews on first wikis ([[phab:T271206|T271206]]; 2/2) (duration: 01m 10s)
* 06:43 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4f3051bf286b89e47ef153532de76756f2e7ade9}}: Enable ReferencePreviews on first wikis ([[phab:T271206|T271206]]; 1/2) (duration: 01m 20s)
* 06:37 elukey: roll restart of druid clusters' zookeeper and an-conf* zookeeper for openjdk-11 upgrades
* 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
* 06:36 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|289dc34feeb0703bb45f4a71c149cd607ef26455}}: Enable new language button for all logged in users outside test projects ([[phab:T280526|T280526]]) (duration: 02m 24s)
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:54 hashar: Restarted Zuul / CI
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 for MCR', diff saved to https://phabricator.wikimedia.org/P12184 and previous config saved to /var/cache/conftool/dbconfig/20200806-050743-marostegui.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
* 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P12182 and previous config saved to /var/cache/conftool/dbconfig/20200806-045622-marostegui.json
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
* 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12181 and previous config saved to /var/cache/conftool/dbconfig/20200806-045107-marostegui.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12180 and previous config saved to /var/cache/conftool/dbconfig/20200806-044608-marostegui.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P12179 and previous config saved to /var/cache/conftool/dbconfig/20200806-043758-marostegui.json
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
* 03:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2019.codfw.wmnet
* 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
* 02:24 eileen: process-control config revision is {{Gerrit|525eb71235}} turn off delete deleted contacts
* 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # [[phab:T281737|T281737]]
* 01:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
* 01:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:55 hashar: Restarting CI Jenkins # [[phab:T281737|T281737]]
* 01:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
* 01:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 01:17 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 01:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 00:35 mutante: wtp2019 - reimaging - parsoid service does not work, unlike on all other wtp*, making sure it's clean
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
* 00:00 mutante: LDAP - removed demon from nda group
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
* 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
* 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
* 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
* 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
* 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T281794|T281794]]', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
* 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
* 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
* 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) [[phab:T280492|T280492]]
* 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
* 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) [[phab:T281212|T281212]]
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
* 04:53 eileen: civicrm revision changed from {{Gerrit|e7c610fd87}} to {{Gerrit|8034e47008}}, config revision is {{Gerrit|189788d452}}
* 03:58 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:56 ryankemper: [[phab:T280563|T280563]] Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:54 ryankemper: [[phab:T280382|T280382]] `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:51 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
* 03:50 ryankemper: [[phab:T280382|T280382]] `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 01:55 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043` from cluster
* 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:49 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
* 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 01:45 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:43 ryankemper: [[phab:T280382|T280382]] [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
* 01:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:29 eileen: civicrm revision changed from {{Gerrit|94e321dbe0}} to {{Gerrit|e7c610fd87}}, config revision is {{Gerrit|189788d452}}
* 00:15 ejegg: updated payments-wiki from {{Gerrit|44570561f2}} to {{Gerrit|d449599540}}
* 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f6ea8c0e5a4dc667969f5847207902727625bbe}}: Growth: enwiki: Add list of mentors ([[phab:T281896|T281896]]) (duration: 01m 10s)
* 00:00 urbanecm@deploy1002: Synchronized fc-list: {{Gerrit|93970496da7678d896b7f812b3bb5f4cf0b691ad}}: update fc-list to current version on buster ([[phab:T79424|T79424]]) (duration: 01m 09s)


== 2020-08-05 ==
== 2021-05-04 ==
* 23:57 eileen: civicrm revision changed from {{Gerrit|150c3476c4}} to {{Gerrit|72452e28a9}}, config revision is {{Gerrit|b6ece03513}}
* 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 3/3) (duration: 01m 09s)
* 23:02 shdubsh: logstash in codfw looks stuck -- restarting
* 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 2/3) (duration: 01m 09s)
* 19:41 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.36.0-wmf.2
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d29dbb2f435afe64f2fee15b430ee04d5d13c8d7}}: Enable Growth features on enwiki in the dark mode ([[phab:T281896|T281896]]; 1/3) (duration: 01m 09s)
* 19:39 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 3/3) (duration: 01m 09s)
* 19:37 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:30 urbanecm@deploy1002: sync-file aborted: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 3/3) (duration: 00m 03s)
* 19:13 brennen@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.3 (duration: 01m 44s)
* 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 2/3) (duration: 01m 09s)
* 19:11 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.3
* 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5b4c516a1d0461065e27cacec5d2b1cb315a2c07}}: Enable Growth team features in dark mode on bgwiki ([[phab:T280824|T280824]]; 1/3) (duration: 01m 09s)
* 18:26 Lucas_WMDE: Morning backport window done
* 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki ([[phab:T281896|T281896]])
* 18:25 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/ContentTranslation/: Backport: [[gerrit:618566{{!}}Pass jQuery objects into jqueryMsg]] (duration: 01m 11s)
* 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki ([[phab:T280824|T280824]])
* 18:14 mutante: test !log
* 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|a3c24f322b754c9a94c260ee5df4b5ae4de27f22}}: Avoid using User::getGroups() and ::getEffectiveGroups() ([[phab:T281823|T281823]]) (duration: 01m 10s)
* 18:11 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:618343{{!}}Re-enable growth study quick survey (T257015)]] (duration: 01m 12s)
* 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e467d92e5e257a3d2f9b05692db9accdd86ddb00}}: Add extendedconfirmed on ptwiki ([[phab:T281926|T281926]]) (duration: 01m 10s)
* 17:30 shdubsh: test prometheus-icinga-exporter upgrade on icinga2001
* 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|012d6138741ea76c985453428111aeddfdec2271}}: Add extendedconfirmed on azwiki ([[phab:T281860|T281860]]) (duration: 01m 10s)
* 16:50 elukey: powercycle stat1005 after GPU issue
* 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 15:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add eventgate-logging-external streams and destination_event_service settings - [[phab:T251935|T251935]] (duration: 01m 05s)
* 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 15:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 15:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 15:11 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 15:08 godog: bounce logstash on logstash100[789] - udp loss reported
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 15:05 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 14:48 elukey: reboot stat1008 for unexpected maintenance (GPU stuck)
* 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 14:33 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 21:30 eileen: civicrm revision changed from {{Gerrit|33a63d5789}} to {{Gerrit|94e321dbe0}}, config revision is {{Gerrit|a212d6ab23}}
* 14:32 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
* 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
* 14:27 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:25 moritzm: installing nmap bugfix updates from buster point release
* 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
* 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
* 14:20 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
* 14:20 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
* 14:14 moritzm: installing pillow security updates
* 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
* 14:03 moritzm: installing node-minimist security updates
* 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
* 13:51 moritzm: installing Linux update to 4.9.132 from buster point update (no reboots, just the package updates)
* 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
* 13:32 jayme: updated helmfile to 0.125.2-0 and helm-diff to 3.1.2-1 on contint* and deploy*
* 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
* 13:28 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
* 13:24 volans@cumin1001: START - Cookbook sre.dns.netbox
* 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
* 13:04 elukey: restart yarn resource managers on an-master100[12] to pick up new Yarn settings - https://gerrit.wikimedia.org/r/c/operations/puppet/+/618529
* 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
* 13:00 moritzm: installing libjpeg-turbo security updates on stretch
* 17:03 brennen: 1.37.0-wmf.4 was branched at {{Gerrit|f069fd8b5a6c817f4860fa68ae2f56b71a139f4a}} for [[phab:T281145|T281145]]
* 12:52 XioNoX: netmon1002:/srv/deployment/librenms/librenms$ sudo -u librenms ./lnms migrate
* 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
* 12:49 jayme: imported helm-diff_3.1.2-1 to buster-wikimedia, jessie-wikimedia and stretch-wikimedia
* 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
* 12:46 moritzm: installing imagemagick security updates on buster
* 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
* 12:33 moritzm: installing net-snmp security updates on icinga hosts
* 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 11:36 awight: EU Bacon reclosed
* 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 11:36 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614891{{!}}Switch test wikis to new version of vector by default (3/3) (T254227)]] (duration: 01m 07s)
* 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace [[phab:T281538|T281538]]
* 11:29 awight: EU Bacon reopened
* 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 11:28 awight: EU Bacon complete
* 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:26 awight@deploy1001: Synchronized wmf-config: Config: [[gerrit:618481{{!}}FileImporter: full default deployment (T232542)]] (duration: 01m 04s)
* 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:23 jayme: imported helm-diff_3.1.2-0 to jessie-wikimedia and stretch-wikimedia
* 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 11:22 jayme: imported helm-diff_3.1.2-0 to buster-wikimedia
* 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:618303{{!}}Add import sources for lijwikisource (T259633)]] (duration: 01m 07s)
* 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:13 awight@deploy1001: sync-file aborted: Config: [[gerrit:618303{{!}}Add import sources for lijwikisource (T259633)]] (duration: 00m 13s)
* 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:10 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:595542{{!}}Enable Data Bridge on Test Wikidata clients (T232584)]] (duration: 01m 20s)
* 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:39 XioNoX: reboot cr3-ulsfo - [[phab:T259621|T259621]]
* 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 10:28 XioNoX: drain traffic away cr3-ulsfo - [[phab:T259621|T259621]]
* 13:46 moritzm: installing exim security updates on buster
* 10:21 moritzm: installing libssh security updates
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
* 10:18 XioNoX: reboot cr4-ulsfo - [[phab:T259621|T259621]]
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
* 09:58 XioNoX: drain traffic away cr4-ulsfo
* 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
* 09:53 XioNoX: depool ulsfo - [[phab:T259621|T259621]]
* 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:32 elukey: set ticket max renewable lifetime to 7d on all kerberos clients (was zero, the default)
* 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:07 jayme: imported helmfile_0.125.2-0 to jessie-wikimedia
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
* 09:07 jayme: imported helmfile_0.125.2-0 to stretch-wikimedia
* 13:01 moritzm: installing debian-archive-keyring updates on buster
* 09:05 jayme: imported helmfile_0.125.2-0 to buster-wikimedia
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
* 08:39 marostegui: Remove revision triggers on db1125:3317
* 12:50 marostegui: Upgrade mysql and kernel on db1137 [[phab:T281212|T281212]]
* 08:39 marostegui: Stop replication on db1079 for MCR, this will generate lag on s7 on labsdb
* 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql [[phab:T281212|T281212]]', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 for MCR', diff saved to https://phabricator.wikimedia.org/P12173 and previous config saved to /var/cache/conftool/dbconfig/20200805-083916-marostegui.json
* 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P12172 and previous config saved to /var/cache/conftool/dbconfig/20200805-083833-marostegui.json
* 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12171 and previous config saved to /var/cache/conftool/dbconfig/20200805-082908-marostegui.json
* 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
* 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12170 and previous config saved to /var/cache/conftool/dbconfig/20200805-082138-marostegui.json
* 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 [[phab:T280751|T280751]]
* 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P12169 and previous config saved to /var/cache/conftool/dbconfig/20200805-081237-marostegui.json
* 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 [[phab:T280751|T280751]]
* 07:49 marostegui: Stop mysql on db1117:3323 (this will generate haproxy irc alerts) [[phab:T259589|T259589]]
* 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
* 07:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 07:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|683b876}}: {{Gerrit|5763630}}: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 ([[phab:T281727|T281727]]) (duration: 00m 58s)
* 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: {{Gerrit|8f938c2}}: {{Gerrit|c8c07ab}}: GrowthExperiments backports ([[phab:T281727|T281727]]) (duration: 00m 59s)
* 07:26 moritzm: installing perl security updates on buster
* 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
* 07:20 moritzm: installing libexif security updates on buster
* 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 07:14 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
* 07:13 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 11:58 marostegui: Upgrade mysql and kernel on db1120 [[phab:T281212|T281212]]
* 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql [[phab:T281212|T281212]]', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
* 07:04 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki ([[phab:T278710|T278710]], [[phab:T281703|T281703]])
* 06:59 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|87dff0b1abe588f0ddc62985fdb40b5ec0fa1bbd}}: GrowthExperiments: Enable link recommendations for target wikis ([[phab:T278710|T278710]]) (duration: 00m 57s)
* 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 ([[phab:T266913|T266913]])
* 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8228f6beacd2f7e94a65f32d41f558c0f440db0a}}: Disable ContentTranslation New article campaign in fiwiki ([[phab:T277473|T277473]]) (duration: 00m 59s)
* 06:50 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
* 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
* 06:46 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
* 05:53 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 for MCR', diff saved to https://phabricator.wikimedia.org/P12167 and previous config saved to /var/cache/conftool/dbconfig/20200805-050907-marostegui.json
* 09:45 godog: +50G for prometheus k8s in codfw
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1136', diff saved to https://phabricator.wikimedia.org/P12166 and previous config saved to /var/cache/conftool/dbconfig/20200805-050808-marostegui.json
* 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12165 and previous config saved to /var/cache/conftool/dbconfig/20200805-050308-marostegui.json
* 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12164 and previous config saved to /var/cache/conftool/dbconfig/20200805-045334-marostegui.json
* 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
* 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1136', diff saved to https://phabricator.wikimedia.org/P12163 and previous config saved to /var/cache/conftool/dbconfig/20200805-043346-marostegui.json
* 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
* 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas ([[phab:T280492|T280492]])
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
* 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
* 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
* 05:45 marostegui: Stop mysql on db1158 to clone db1178
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
* 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
* 05:07 marostegui: Restart sanitarium hosts to pick up new filters [[phab:T263817|T263817]]
* 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
* 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
* 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 03:36 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
* 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
* 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]


== 2020-08-04 ==
== 2021-05-03 ==
* 22:41 brennen: restarting php7.2-fpm on mw1404 for opcache issues
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|230ef5716b34ca83348667f289180313b76ce8a3}}: Prepare for new configuration option ([[phab:T277951|T277951]]) (duration: 00m 57s)
* 21:45 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7c47ee17b3936fb1f79590187a9e0028276e4a9d}}: Replace $wgRelatedArticlesFooterWhitelistedSkins ([[phab:T277958|T277958]]) (duration: 00m 57s)
* 21:38 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 23:14 urbanecm@deploy1002: sync-file aborted: {{Gerrit|7c47ee17b3936fb1f79590187a9e0028276e4a9d}}: Replace $wgRelatedArticlesFooterWhitelistedSkins ([[phab:T277958|T277958]])¨ (duration: 00m 01s)
* 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
* 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
* 21:03 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:56 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 20:55 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 21:54 ryankemper: [[phab:T280563|T280563]] eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
* 20:52 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 20:27 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch (duration: 02m 22s)
* 21:47 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]`
* 20:25 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: use provided ca certs for elasticsearch
* 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 20:15 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances (duration: 02m 07s)
* 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d95b91648}} (duration: 00m 58s)
* 20:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b17bfd4]: Move mjolnir daemons from cirrus hosts to dedicated instances
* 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
* 19:19 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.3
* 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
* 19:11 brennen@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.3 (duration: 91m 03s)
* 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
* 19:03 brennen: current 1.36.0-wmf.3 train status ([[phab:T257971|T257971]]): mid scap-cdb-rebuild for testwiki sync; will proceed with group0 when finished.
* 21:20 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
* 18:55 sukhe: upload pdns-recursor_4.3.3-1~deb10u1 to apt.wm.o (buster) - [[phab:T252132|T252132]]
* 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
* 18:49 mutante: letting puppet install envoy on all ores1* hosts
* 21:09 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 18:46 mutante: letting puppet install envoy on all ores2* hosts
* 21:06 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 18:37 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:26 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 21:02 ryankemper: [[phab:T280382|T280382]] `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  975G  1.5T  39% /srv`
* 18:19 mutante: temp disabling puppet on all ores hosts to add envoy
* 20:56 ryankemper: [[phab:T280382|T280382]] [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
* 17:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:40 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:40 brennen@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.3
* 20:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 17:36 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:25 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 17:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 17:17 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:24 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 17:05 brennen: 1.36.0-wmf.3 was branched at {{Gerrit|2d0cf09cdf}} for [[phab:T257971|T257971]]
* 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:51 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
* 16:49 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:21 ryankemper: [[phab:T280382|T280382]] [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
* 16:43 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:20 Urbanecm: Morning B&C window done
* 16:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: {{Gerrit|cf9d9da3bf272d33c2d9b29d9172b1c81bfd8beb}}: Hotfix: loadRelatedArticles should consider existence of container element ([[phab:T281547|T281547]]) (duration: 00m 57s)
* 16:24 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: {{Gerrit|bc1bc903169e4982c0c5a930094bed9f22616293}}: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads ([[phab:T281650|T281650]]; 2/2) (duration: 00m 57s)
* 16:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|bc1bc903169e4982c0c5a930094bed9f22616293}}: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads ([[phab:T281650|T281650]]; 1/2) (duration: 00m 58s)
* 16:15 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # [[phab:T281737|T281737]]
* 16:05 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
* 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
* 15:53 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Set default topic_prefixes - [[phab:T255888|T255888]] (duration: 00m 58s)
* 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
* 15:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 15:39 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 15:27 Amir1: upgrade group A to mailman3 ([[phab:T280322|T280322]])
* 15:38 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
* 15:31 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 15:18 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove now unused wgEventServiceStreamConfig - [[phab:T229863|T229863]] (duration: 00m 58s)
* 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user ([[phab:T281703|T281703]])
* 15:18 moritzm: installing jackson-databind security issues
* 12:36 kostajh: Backport window done
* 15:08 moritzm: installing qemu security updates on cloudvirt* Stretch hosts
* 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684378{{!}}GrowthExperiments: Set default variant (T278123)]] [[gerrit:684331{{!}}GrowthExperiments: enable link recommendations frontend on cswiki (T278710)]] (duration: 00m 57s)
* 14:54 cmjohnson1: swapping kubernetes1010 network cable [[phab:T257542|T257542]]
* 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:684327{{!}}GrowthExperiments: enable link recommendations backend on cswiki (T278710)]] (duration: 00m 57s)
* 14:48 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: [[gerrit:684080{{!}}refreshLinkRecommendations.php: Use per-wiki locks]] [[gerrit:684078{{!}}Handle DB readonly errors (T281382)]] (duration: 00m 58s)
* 14:41 cmjohnson1: powercycling analytics1050 [[phab:T258370|T258370]]
* 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|a438b641c81fa16faba287407012beaff8b1f3ba}}: Fix settings dialog offering ReferencePreviews when unavailable ([[phab:T281352|T281352]]) (duration: 00m 58s)
* 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136 for MCR', diff saved to https://phabricator.wikimedia.org/P12161 and previous config saved to /var/cache/conftool/dbconfig/20200804-143524-marostegui.json
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c5a7c67b4daf33e0f9aaabec3f35ab6d4184894b}}: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12160 and previous config saved to /var/cache/conftool/dbconfig/20200804-142710-marostegui.json
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61}}: wikidata: post edit constraint jobs on 70% of edits ([[phab:T204031|T204031]]) (duration: 00m 57s)
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12159 and previous config saved to /var/cache/conftool/dbconfig/20200804-142220-marostegui.json
* 10:59 moritzm: installing avahi security updates on buster
* 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12158 and previous config saved to /var/cache/conftool/dbconfig/20200804-141556-marostegui.json
* 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:684302{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P12157 and previous config saved to /var/cache/conftool/dbconfig/20200804-141004-marostegui.json
* 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:684302{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 09:42 moritzm: installing python3.7 security updates
* 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'coredns' .
* 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
* 13:56 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
* 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
* 13:51 hashar: Install newer openjdk on contint2001 and restarting CI Jenkins
* 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
* 12:00 jayme: helm was updated: 2.16.7-2 -> 2.16.9-1 on chartmuseum*, contint*, deploy*
* 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
* 11:43 Lucas_WMDE: EU backport window done
* 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
* 11:41 marostegui: Deploy schema change on s3 codfw master, lag might show up on codfw s3 [[phab:T259238|T259238]]
* 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
* 11:37 moritzm: installing openjdk-11 security updates
* 08:01 moritzm: installing edk2 security updates
* 11:36 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:618266{{!}}Load WikibaseRepo using extension registration in production (T257433)]] (duration: 00m 58s)
* 07:31 moritzm: installing libimage-exiftool-perl security updates
* 11:12 Lucas_WMDE: Deployed patch for [[phab:T86738|T86738]] / [[phab:T259565|T259565]]
* 11:03 moritzm: installing e2fsprogs security updates for stretch
* 10:47 moritzm: installing tomcat8 security updates
* 10:47 vgutierrez: upgrade acme-chief to version 0.28
* 10:33 vgutierrez: upload acme-chief 0.28 to apt.wm.o (buster) - [[phab:T259338|T259338]]
* 10:18 moritzm: installing imagemagick security updates on stretch
* 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 for MCR and PK change [[phab:T259524|T259524]]', diff saved to https://phabricator.wikimedia.org/P12156 and previous config saved to /var/cache/conftool/dbconfig/20200804-100035-marostegui.json
* 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12155 and previous config saved to /var/cache/conftool/dbconfig/20200804-095608-marostegui.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P12154 and previous config saved to /var/cache/conftool/dbconfig/20200804-094909-marostegui.json
* 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 moritzm: installing python3.5 security updates
* 08:15 moritzm: installing remaining cups security updates
* 08:13 XioNoX: cleaning up a bunch of prefix limit reached issues
* 08:00 marostegui: Failover m2 from db1132 to db1107 -[[phab:T257540|T257540]]
* 07:54 moritzm: installing poppler security updates on stretch
* 07:43 jayme: imported helm_2.16.9-1 to jessie-wikimedia
* 07:43 jayme: imported helm_2.16.9-1 to stretch-wikimedia
* 07:38 jayme: imported helm_2.16.9-1 to buster-wikimedia
* 07:34 elukey: upgrade druid analytics (backend for Turnilo/Superset/etc..) to 0.19
* 07:32 XioNoX: remove nonstop-bridging from fasw-c-eqiad switches - [[phab:T191667|T191667]]
* 07:29 XioNoX: remove nonstop-bridging from eqiad asw2 switches - [[phab:T191667|T191667]]
* 07:28 XioNoX: remove nonstop-bridging from asw2-esams - [[phab:T191667|T191667]]
* 07:27 marostegui: Start topology changes on m2 - [[phab:T257540|T257540]]
* 07:25 moritzm: installing rails security updates
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P12153 and previous config saved to /var/cache/conftool/dbconfig/20200804-064223-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12152 and previous config saved to /var/cache/conftool/dbconfig/20200804-063026-marostegui.json
* 06:27 _joe_: restarting docker daemon on kubestage1002, seems like a case of https://github.com/moby/moby/issues/29635
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore original weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12151 and previous config saved to /var/cache/conftool/dbconfig/20200804-062358-marostegui.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12150 and previous config saved to /var/cache/conftool/dbconfig/20200804-062256-marostegui.json
* 06:19 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 06:13 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling lilypond execution in safe mode 3rd attempt (duration: 00m 58s)
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 on main traffic', diff saved to https://phabricator.wikimedia.org/P12149 and previous config saved to /var/cache/conftool/dbconfig/20200804-061255-marostegui.json
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P12148 and previous config saved to /var/cache/conftool/dbconfig/20200804-061209-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 for MCR', diff saved to https://phabricator.wikimedia.org/P12147 and previous config saved to /var/cache/conftool/dbconfig/20200804-061003-marostegui.json
* 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for reimage', diff saved to https://phabricator.wikimedia.org/P12146 and previous config saved to /var/cache/conftool/dbconfig/20200804-051843-marostegui.json
* 05:04 marostegui: Reboot db1107 to pick up the last kernel
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12145 and previous config saved to /var/cache/conftool/dbconfig/20200804-050150-marostegui.json
* 03:56 legoktm: added Arlo to wmf-deployment Gerrit group
* 03:53 legoktm: added subbu to wmf-deployment Gerrit group


== 2020-08-03 ==
== 2021-05-02 ==
* 23:43 mutante: mwdebug1001 - temp installing apt-file for debugging an issue on mwmaint
* 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 23:14 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on fawiki ([[phab:T253291|T253291]]) (duration: 00m 59s)
* 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 21:35 sbassett: Deployed mitigations for [[phab:T115888|T115888]]
* 21:14 sbassett@deploy1001: Synchronized php-1.36.0-wmf.2/resources/src/mediawiki.jqueryMsg/mediawiki.jqueryMsg.js: (no justification provided) (duration: 01m 00s)
* 18:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:13 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 18:09 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update (duration: 15m 53s)
* 17:53 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: deploy 0.3.43 and gui update
* 17:33 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
* 17:28 dcausse@deploy1001: Finished deploy [wdqs/wdqs@20dcff3]: (no justification provided) (duration: 00m 35s)
* 17:28 dcausse@deploy1001: Started deploy [wdqs/wdqs@20dcff3]: (no justification provided)
* 16:58 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.36.0-wmf.1"
* 16:21 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 16:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 16:02 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 15:55 _joe_: regenerating the TLS certs for blubberoid
* 15:33 XioNoX: standardize all routers routing-options config
* 15:27 marostegui: Change PK on frwiktionary.revision on db2087:3317, db2129, db2121 db2086:3317 [[phab:T259524|T259524]]
* 15:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:12 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P12143 and previous config saved to /var/cache/conftool/dbconfig/20200803-145111-marostegui.json
* 14:40 moritzm: update Buster netboot images to Buster 10.5 [[phab:T259519|T259519]]
* 14:33 XioNoX: disable all ALGs from pfw3-codfw
* 14:28 XioNoX: remove IGMP and PIM from pfw3-codfw security zones
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into dump and depool db1106', diff saved to https://phabricator.wikimedia.org/P12142 and previous config saved to /var/cache/conftool/dbconfig/20200803-142749-marostegui.json
* 14:27 XioNoX: remove nonstop-bridging from fasw-c-codfw - [[phab:T191667|T191667]]
* 14:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:20 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:04 filippo@deploy1001: Finished deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - [[phab:T257017|T257017]] (duration: 00m 23s)
* 14:03 filippo@deploy1001: Started deploy [librenms/librenms@413e006]: Upgrade LibreNMS to 1.66 - [[phab:T257017|T257017]]
* 14:00 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'enable-puppet "cdanis deploying I92e9a05"'
* 13:56 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin A:puppetmaster 'disable-puppet "cdanis deploying I92e9a05"'
* 13:27 moritzm: installing libopenmpt security updates
* 13:15 XioNoX: remove nonstop-bridging from asw-d-codfw - [[phab:T191667|T191667]]
* 13:14 XioNoX: remove nonstop-bridging from asw-c-codfw - [[phab:T191667|T191667]]
* 13:12 XioNoX: remove nonstop-bridging from asw-b-codfw - [[phab:T191667|T191667]]
* 13:11 XioNoX: remove nonstop-bridging from asw-a-codfw - [[phab:T191667|T191667]]
* 13:05 moritzm: installing json-c security updates
* 12:53 XioNoX: move VRRP master to cr3-eqsin
* 12:32 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
* 12:26 moritzm: installing apache-log4j1.2 security updates
* 12:20 moritzm: restarting nginx on francium to pick up luajit update
* 12:13 kormat: disabling puppet on cumin hosts [[phab:T259021|T259021]]
* 11:55 moritzm: installing luajit security updates
* 11:20 moritzm: installing ruby-rack security updates
* 11:19 Urbanecm: EU B&C done
* 11:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|346138d95721c274d450568388fb2ad1803dba9e}}: Add extra namespaces for yuewiktionary ([[phab:T258913|T258913]]) (duration: 01m 06s)
* 11:12 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8c2a2b2187fcc33c37b9d0f6bafcc963afd0b74b}}: Add gpophotoeng.gov.il to the wgCopyUploadsDomains allowlist for commonswiki ([[phab:T258857|T258857]]) (duration: 01m 07s)
* 11:03 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: {{Gerrit|ead6b9eb699594583b06b8f5c23d40d9add2eb49}}: New throttle rule for Czech editathon ([[phab:T259352|T259352]]) (duration: 01m 06s)
* 11:03 moritzm: installing ruby2.5 security updates
* 11:01 moritzm: removing cloudcephmon100[1-3].wikimedia.org from debmonitor (these eventually got re-installed as cloudcephmon100[1-3].eqiad.wmnet)
* 10:51 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:618020{{!}} Bumping portals to master (T128546)]] (duration: 01m 06s)
* 10:50 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:618020{{!}} Bumping portals to master (T128546)]] (duration: 01m 08s)
* 10:29 moritzm: installing NSS security updates on buster
* 10:26 moritzm: restarting Apache on puppetboard to pick up curl security updates
* 10:19 moritzm: restarting wtp1025 (parsoid canary) to pick up curl security updates
* 09:46 moritzm: restarting mw1261-mw1265 to pick up curl security updates
* 09:42 moritzm: installing curl security updates on stretch
* 08:59 moritzm: installing ffmpeg security updates on jobrunners/video scalers (3.2.15 rebuilt with VP9/row-mt patches)
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089 into API', diff saved to https://phabricator.wikimedia.org/P12141 and previous config saved to /var/cache/conftool/dbconfig/20200803-082641-marostegui.json
* 08:25 moritzm: installing qemu security updates on stretch
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12140 and previous config saved to /var/cache/conftool/dbconfig/20200803-082533-marostegui.json
* 08:22 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify s5 wikis [[phab:T259437|T259437]] (duration: 01m 05s)
* 08:21 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify s5 wikis [[phab:T259437|T259437]] (duration: 01m 40s)
* 08:07 elukey: roll restart aqs on aqs* to pick up new druid settings
* 07:10 marostegui: Remove revision triggers from db2095:3317 for MCR changes [[phab:T238966|T238966]]
* 07:09 marostegui: Deploy MCR change on s7 codfw, lag will appear on codfw [[phab:T238966|T238966]]
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12139 and previous config saved to /var/cache/conftool/dbconfig/20200803-070702-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12138 and previous config saved to /var/cache/conftool/dbconfig/20200803-052715-marostegui.json
* 05:04 marostegui: Remove db1108:3321 and db1108:3322 from tendril and add db1108:3351 and db1108:3352 [[phab:T254462|T254462]]
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106 after compression', diff saved to https://phabricator.wikimedia.org/P12137 and previous config saved to /var/cache/conftool/dbconfig/20200803-050148-marostegui.json


== 2020-08-01 ==
== 2021-05-01 ==
* 16:30 Amir1: wikiadmin@10.64.32.197(avkwiki)> delete from site_identifiers; ([[phab:T259122|T259122]])
* 19:12 Urbanecm: Invalidate password for MaraBot@SUL ([[phab:T281586|T281586]])
* 16:27 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T259122|T259122]])
* 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos ([[phab:T280908|T280908]]) (duration: 00m 56s)
* 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt


==Archives==
==Archives==

Revision as of 23:34, 3 August 2021

2021-08-03

  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Re-enable commonswiki sister search (T277225) (duration: 01m 07s)
  • 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for T287988 (T281158)
  • 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 3/3) (duration: 01m 07s)
  • 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 7d286dc: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 07s)
  • 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: T286463
  • 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 00m 37s)
  • 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 00m 37s)
  • 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 2/3) (duration: 01m 07s)
  • 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: 2d4ea75: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos (T287988; 1/3) (duration: 01m 08s)
  • 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
  • 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
  • 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
  • 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
  • 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
  • 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
  • 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:31 ryankemper: T285355 `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
  • 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
  • 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
  • 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
  • 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
  • 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
  • 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
  • 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer (T286853)
  • 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
  • 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
  • 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization (duration: 00m 48s)
  • 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: T286642 fixes to bulk daemon prioritization
  • 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
  • 16:59 hashar: Gerrit has been upgraded
  • 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
  • 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
  • 16:45 urbanecm: Start server side upload for 1 video file (T287957)
  • 16:45 hashar: Stopping Gerrit for upgrade
  • 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
  • 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
  • 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
  • 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
  • 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) T286206
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
  • 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
  • 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
  • 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
  • 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
  • 12:47 moritzm: restarting Tomcat on idp1001
  • 12:05 moritzm: installing libgcrypt20 security updates
  • 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 11:36 moritzm: updated bullseye d-i images to rc3 T275873
  • 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - T222113
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:13 moritzm: rename Ganeti group for test cluster to row_D T286206
  • 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
  • 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 09:18 marostegui: Failover m1, m2 and m3-master T287574
  • 09:12 moritzm: installinh php 7.0 security updates on stretch
  • 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - T286054
  • 08:57 moritzm: installing pillow security updates on stretch
  • 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
  • 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
  • 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
  • 06:31 kart__: Updated cxserver to 2021-08-02-164000-production (T286473)
  • 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
  • 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
  • 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
  • 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
  • 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
  • 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
  • 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)

2021-08-02

  • 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 23:21 legoktm: Previous sync also deployed c38998f03f "Stop enabling DPL on new wikis" (T287380)
  • 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
  • 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
  • 21:31 tzatziki: removing 1 file for legal compliance
  • 21:16 tzatziki: removing 7 files for legal compliance
  • 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287868, T287874, T287873)
  • 19:00 urbanecm: Morning B&C window completed
  • 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 2/2) (duration: 00m 56s)
  • 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bebf4a9: Enable Growth features on a couple of wikis in dark mode (T287868, T287874, T287873; 1/2) (duration: 00m 57s)
  • 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - T287652 (duration: 00m 56s)
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 2/2) (duration: 00m 56s)
  • 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 18cd360: Growth features: Enable features in dark mode on a few wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872; 1/2) (duration: 00m 56s)
  • 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis (T287876, T287871, T287878, T287880, T287875, T287879, T287872)
  • 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ee47f9d: Add rollbacker group for kswiki (T286789) (duration: 00m 56s)
  • 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eec997c: Enable SUL autologin for wikimania.wikimedia.org (T285197) (duration: 00m 55s)
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 2/2) (duration: 00m 56s)
  • 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: 05cf1d6: Add a link: Show article extract instead of description in the link inspector (T287636; 1/2) (duration: 00m 57s)
  • 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cc8ca45: Add tewikisource as import source for tewikibooks (T286978) (duration: 00m 56s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11e96ba: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287264) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: 97b6897: Remove unused enwiki celebration logos (T272108) (duration: 00m 57s)
  • 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 16f9794: Remove unused eswiki celebration logos (T280908) (duration: 00m 57s)
  • 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 15:44 jynus: remove s2 from db1139 T287230
  • 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
  • 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
  • 13:02 mutante: gerrit1001 - restarting service after 706049
  • 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
  • 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
  • 12:20 mutante: gerrit servers: disabling puppet
  • 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: T287528 (duration: 00m 57s)
  • 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287780 (duration: 00m 57s)
  • 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
  • 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
  • 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: T287782 (duration: 00m 56s)
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
  • 11:29 hashar: restarting gerrit primary server on gerrit1001
  • 11:27 hashar: restarting Jenkins on contint2001
  • 11:27 hashar: restarting Jenkins on contint1001
  • 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
  • 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:13 urbanecm: EU B&C window completed
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43020b7: votewiki: Enable Single Transferable Vote (T283728) (duration: 00m 57s)
  • 11:08 moritzm: installing openjdk-11 security updates
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26bcaaf: Restore logging for mediamoderation script to better understand high error rate occurring when running script (T287511) (duration: 00m 57s)
  • 07:53 moritzm: catch up bullseye installs with latest state of testing
  • 07:24 moritzm: installing libsndfile security updates on buster
  • 07:12 moritzm: installing aspell security updates
  • 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)

2021-07-31

2021-07-30

  • 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:20 eileen: civicrm revision is 158ed65e00, config revision is 6011d9c471
  • 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
  • 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - T287760 (duration: 00m 57s)
  • 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
  • 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
  • 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
  • 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
  • 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
  • 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
  • 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
  • 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
  • 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:26 joe: uploaded docker-report 0.0.13 to buster
  • 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
  • 11:23 moritzm: installing libsndfile security updates on stretch
  • 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
  • 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
  • 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
  • 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
  • 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. T284592
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
  • 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
  • 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json

2021-07-29

  • 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Merge new configs with existing testwiki definition (duration: 00m 57s)
  • 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16 refs T281157
  • 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704) (duration: 01m 09s)
  • 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15 refs T281157
  • 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16 refs T281157
  • 18:37 urbanecm@deploy1002: Finished scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585) (duration: 17m 06s)
  • 18:19 urbanecm@deploy1002: Started scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585)
  • 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: 9a2383d: Display: Use HTML "dir" attribute for ltr/rtl (T287649) (duration: 01m 25s)
  • 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
  • 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:11 mmandere: pool lvs1013.eqiad.wmnet - T286032
  • 15:09 mmandere: pool dns1001.wikimedia.org - T286032
  • 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - T286032
  • 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:46 mmandere: depool lvs1013 - T286032
  • 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:39 mmandere: depool dns1001 - T286032
  • 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - T286032
  • 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:11 vgutierrez: restart pybal on lvs2009
  • 14:09 vgutierrez: restart pybal on lvs2010
  • 14:07 vgutierrez: restart pybal on lvs2008
  • 14:05 vgutierrez: restart pybal on lvs2007
  • 13:59 vgutierrez: restart pybal on lvs1014
  • 13:55 vgutierrez: restart pybal on lvs1015
  • 13:52 _joe_: restarting pybal on lvs1016
  • 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
  • 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
  • 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T287230', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
  • 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
  • 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
  • 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
  • 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
  • 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:52 moritzm: restarting Tomcat on idp-test
  • 06:41 XioNoX: push pfw policies - T287203
  • 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
  • 01:08 eileen: civicrm revision changed from 739c936298 to 158ed65e00, config revision is 6011d9c471

2021-07-28

  • 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgSkipSkins: Update defaults, hide modern (T287616) (duration: 01m 06s)
  • 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: Disable mobile contributions simplifications on Wikidata and Commons (T283988) (duration: 01m 58s)
  • 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16 refs T281157 (duration: 01m 06s)
  • 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16 refs T281157
  • 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
  • 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
  • 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
  • 18:14 jbond: manually cleared out the puppetdb2002 queue
  • 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
  • 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:58 ryankemper: T287112 [WDQS] Re-pooled `wdqs2002`
  • 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
  • 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing (T279309)
  • 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
  • 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
  • 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
  • 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
  • 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
  • 13:29 moritzm: installing python2.7 security updates on stretch
  • 13:08 moritzm: installing python3.5 security updates on stretch
  • 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:27 moritzm: installing nginx security updates on thumbor*
  • 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
  • 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:11 moritzm: installing remaining nginx security updates on stretch
  • 10:09 godog: temp fix prometheus-icinga-am on alert1001
  • 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:40 urbanecm: Start server-side upload for 1 video file (T287482)
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:27 Amir1: running several long-running queries against pc1007
  • 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 07:53 moritzm: installing aspell security updates on stretch
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: T287559
  • 07:07 godog: remove cloud*/syslog.log from centrallog2001 - T287559
  • 07:06 godog: remove node_pinger.prom from node-pinger hosts
  • 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
  • 02:43 TimStarling: on mwmaint2002 fixing T286273 broken files using eval.php

2021-07-27

  • 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: Restore print, links, table and message box styles (T278896) (duration: 01m 07s)
  • 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable user links on office + test wikis (T287391) (duration: 02m 00s)
  • 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
  • 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
  • 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
  • 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
  • 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
  • 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
  • 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - T287210 (duration: 02m 28s)
  • 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
  • 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
  • 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
  • 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
  • 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
  • 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
  • 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
  • 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - T287238
  • 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) T147505
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 15:17 mmandere: pool lvs1014.eqiad.wmnet - T286061
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 T286061
  • 15:11 mmandere: pool authdns1001.wikimedia.org - T286061
  • 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - T286061
  • 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
  • 14:53 moritzm: disabling puppet for upcoming row B maintenance
  • 14:52 mmandere: depool lvs1014 - T286061
  • 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:40 mmandere: depool authdns1001 - T286061
  • 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - T286061
  • 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
  • 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - T287238
  • 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T287230', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
  • 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:11 moritzm: installing aspell security updates
  • 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
  • 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
  • 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:30 ottomata: deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
  • 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
  • 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
  • 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
  • 11:23 Lucas_WMDE: EU backport+config window done
  • 11:20 oblivian@deploy1002: Synchronized debug.json: Config: Add the experimental kubernetes backend to mwdebug (T283056) (duration: 00m 56s)
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add stream configuration for ContentTranslation events (T281982) (duration: 00m 58s)
  • 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
  • 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
  • 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
  • 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
  • 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
  • 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
  • 09:52 jynus: reverting query killer parameters on s3 codfw replicas
  • 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
  • 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
  • 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
  • 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
  • 08:57 _joe_: repooling mw225[12] for apis
  • 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
  • 08:36 jynus: reenabled puppet on mwmaint1002
  • 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
  • 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
  • 07:52 jynus: disabling puppet on mwmaint1002
  • 07:14 moritzm: installing krb security updates on buster
  • 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - T287238
  • 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part II (duration: 00m 56s)
  • 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part I (duration: 00m 57s)
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json

2021-07-26

  • 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
  • 18:30 cstone: SmashPig revision changed from be272c02ce to 020d4eccd4,
  • 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - T287394
  • 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
  • 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. T287394
  • 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
  • 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # T287122
  • 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: Don’t generate current content text twice, Part II (duration: 01m 49s)
  • 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: Don’t generate current content text twice, Part I (duration: 01m 50s)
  • 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
  • 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
  • 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
  • 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:15 XioNoX: rollback sampling for T286038
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
  • 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 07:18 _joe_: docker-image prune on deneb T287222
  • 07:17 _joe_: manage-production-images prune on deneb, T287222
  • 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
  • 06:39 moritzm: installing krb5 security updates
  • 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki

2021-07-24

  • 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280397' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # T287321

2021-07-23

  • 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - T287110
  • 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw T287110
  • 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
  • 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 16:15 effie: enable puppet on mc-gp* hosts
  • 15:47 papaul: powerdown wdqs2002 for IDRAC reset
  • 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - T287238
  • 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging T285384
  • 14:16 brennen: gitlab1001: running ansible to deploy fix puma exporter listen address (T275170)
  • 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232 (duration: 03m 32s)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232
  • 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - T287244
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
  • 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
  • 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
  • 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
  • 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
  • 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
  • 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
  • 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
  • 03:11 ryankemper: T287223 Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
  • 03:09 ryankemper: T287223 Installed `nginx-light` on all of `elastic1*` (eqiad)
  • 03:06 ryankemper: T287223 Installed `nginx-light` on all of `elastic2*` (codfw)
  • 02:53 ejegg: updated Fundraising CiviCRM from 819c11307d to 739c936298
  • 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
  • 01:28 ejegg: updated payments-wiki from 844b59ee42 to cc5d14ea7f
  • 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # T287222

2021-07-22

  • 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: Make sure enable responsive mode UI reflects actual preference value (T285402) (duration: 00m 56s)
  • 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - T282855 T238138 T282562 T271168 (duration: 00m 55s)
  • 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
  • 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
  • 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
  • 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
  • 19:00 urbanecm: Start server-side upload for 1 video file (T287061)
  • 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232 (duration: 03m 22s)
  • 18:58 urbanecm: Start server-side upload for 1 video file (T286489)
  • 18:56 urbanecm: Start server-side upload for 1 video file (T286665)
  • 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232
  • 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # T286500
  • 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26c23de: hewikisource: Add namespace aliases (T286500) (duration: 00m 55s)
  • 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 599c220: enwikisource: Create upload-shared user group (T285130) (duration: 00m 56s)
  • 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232 (duration: 03m 18s)
  • 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232
  • 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6a90930: Enable the visual editor on the 2021 namespace on Wikimania wiki (T287197) (duration: 00m 55s)
  • 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f765832: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287204) (duration: 00m 55s)
  • 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:10 legoktm: testing dc switchover warmup script in eqiad
  • 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
  • 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
  • 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
  • 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
  • 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
  • 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
  • 16:56 brennen: gitlab1001: running ansible to deploy gerrit:706396 (T275170)
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
  • 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
  • 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
  • 15:45 marostegui: Stop db2091 for onsite maintenance
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
  • 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
  • 15:14 mmandere: pool lvs1015 - T286065
  • 15:14 jynus: shutdown db2097 for hw servicing T287072
  • 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
  • 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - T286065
  • 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
  • 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:47 mmandere: depool lvs1015 - T286065
  • 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - T286065
  • 14:29 effie: restarting pybal in lvs2009 and lvs1015
  • 14:27 moritzm: installing libwebp security updates on stretch
  • 14:25 effie: restarting pybal in lvs2010 and lvs1016
  • 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0208fc2: Growth: Add mentor dashboard related config (T278920) (duration: 00m 55s)
  • 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
  • 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
  • 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
  • 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 11:36 Lucas_WMDE: EU backport+config window done
  • 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: Avoid using MWHttpRequest::factory() (2/2) (duration: 01m 04s)
  • 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: Avoid using MWHttpRequest::factory() (1/2) (duration: 01m 04s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: Avoid using WikiPage::factory() (duration: 01m 06s)
  • 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
  • 10:45 effie: restart pybal on lvs2009 and lvs1015
  • 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
  • 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:42 effie: restart pybal on lvs2010 and lvs1016
  • 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
  • 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
  • 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - T287110
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - T287110
  • 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
  • 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump T286888', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
  • 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
  • 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
  • 05:31 ryankemper: T281327 [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
  • 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
  • 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE

2021-07-21

  • 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis (T257066) (duration: 01m 03s)
  • 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:41 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:27 dancy: testing upcoming Scap release on beta
  • 18:27 ryankemper: T281327 [Elastic] `sudo -i wmf-auto-reimage-host -p T281327 elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
  • 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
  • 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
  • 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
  • 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: 1453831: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: aca510b: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 05s)
  • 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # T285811
  • 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085) (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
  • 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
  • 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
  • 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
  • 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 15:17 moritzm: installing intel-microcode security updates on stretch
  • 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
  • 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for T286679 (duration: 04m 45s)
  • 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for T286679
  • 14:40 papaul: powerdown ms-be2038 for BBU replacement
  • 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197 (duration: 00m 09s)
  • 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197
  • 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
  • 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
  • 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 T287036
  • 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
  • 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
  • 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
  • 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
  • 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
  • 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
  • 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
  • 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
  • 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
  • 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
  • 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
  • 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d6699da: GrowthExperiments: Add more wikis to linkrecommendation experiment (T284481) (duration: 01m 31s)
  • 10:50 moritzm: installing systemd security updates on bullseye
  • 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 10:14 effie: enable puppet on mw* servers
  • 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - T287038
  • 09:34 jynus: restart db2097 T287072
  • 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
  • 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156 (duration: 45m 51s)
  • 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
  • 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 08:31 godog: upgrade karma on alert hosts - T284213
  • 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:17 effie: enable puppet on alert*
  • 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156
  • 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
  • 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
  • 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:56 XioNoX: push extra sampling on cr2-eqiad - T286038
  • 07:44 XioNoX: push extra sampling on cr1-eqiad - T286038
  • 07:38 XioNoX: update RIS peer IP on cr2-codfw
  • 07:16 godog: powercycle ms-be2048
  • 07:03 moritzm: installing systemd security updates on stretch
  • 06:51 effie: restart memcached on eqiad mc* hosts
  • 06:51 effie: enable puppet on mc* hosts
  • 06:35 effie: disable puppet on mc1* hosts and icinga - T271967
  • 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-20

  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: caa5a07: Set wgGEMentorDashboardBackendEnabled properly (T285811) (duration: 00m 57s)
  • 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: dafd953: updateMenteeData: Make it possible to disable script per-wiki (T285811) (duration: 00m 58s)
  • 18:57 urbanecm: Start server-side upload for 4 large PNG files (T285708)
  • 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
  • 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:06 rzl: enabled puppet on A:mw
  • 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
  • 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
  • 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
  • 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
  • 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
  • 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:23 vgutierrez: pool dns1002 - T286069
  • 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - T286069
  • 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
  • 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 14:53 urbanecm: Start server-side upload for 7 large PNG files (T285708)
  • 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
  • 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:46 vgutierrez: depool dns1002 - T286069
  • 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - T286069
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
  • 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
  • 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10|0[1-9]).codfw.wmnet
  • 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - T285643
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 12:44 moritzm: installing systemd security updates on buster
  • 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
  • 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
  • 11:58 Lucas_WMDE: EU config+backport window done
  • 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Avoid using User::newFrom* methods (3/3) (duration: 00m 56s)
  • 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Avoid using User::newFrom* methods (2/3) (duration: 00m 56s)
  • 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: Avoid using User::newFrom* methods (1/3) (duration: 00m 56s)
  • 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 3/3) (duration: 00m 56s)
  • 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 2/3) (duration: 00m 56s)
  • 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 1/3) (duration: 00m 57s)
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add patroller group for ckbwiki (T285221) (duration: 00m 57s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Typo fix: "the the" -> "the" (T201491) (2/2, beta) (duration: 00m 56s)
  • 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Typo fix: "the the" -> "the" (T201491) (1/2, prod) (duration: 00m 57s)
  • 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update config for language switching on pilot wikis (T286459) (duration: 00m 59s)
  • 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
  • 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
  • 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
  • 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
  • 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
  • 03:17 eileen: civicrm revision changed from 20e9ef6bbb to 819c11307d, config revision is bb405c5232

2021-07-19

  • 20:48 urbanecm: Deploy security patch for T286884
  • 20:29 vgutierrez: pool text@codfw - T286921
  • 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877) (duration: 00m 58s)
  • 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
  • 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: Add sanity check to newRevisionFromRowAndSlots. (T286877) (duration: 00m 57s)
  • 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - T286921
  • 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - T286921
  • 18:46 brennen: gerrit1001: restarting gerrit
  • 18:40 vgutierrez: stop pybal on lvs2009 - T286921
  • 18:38 brennen: re-enabling puppet on gerrit1001]
  • 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - T286921
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P{relforge*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P{cloudelastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - T286921
  • 18:20 vgutierrez: enabling pybal on lvs2007 - T286921
  • 18:19 ryankemper: T264053 Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P{elastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
  • 18:06 dancy@deploy1002: Synchronized .pipeline: Config: pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately (duration: 00m 56s)
  • 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
  • 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
  • 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
  • 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
  • 17:30 volans: running puppet on elastic2038 after nework was restored
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
  • 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:23 volans: running authdns-update to force-update authdns2001
  • 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 XioNoX: remove ns1 redirect - T286787
  • 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
  • 17:10 XioNoX: enable asw-a2-codfw access ports - T286787
  • 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - T286787
  • 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
  • 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
  • 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
  • 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
  • 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
  • 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
  • 16:40 XioNoX: update asw-a2-codfw serial number - T286787
  • 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
  • 16:21 mutante: depooled logstash2021 for dcops maintenance work
  • 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
  • 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
  • 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
  • 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 310be45f7 (duration: 00m 57s)
  • 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
  • 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
  • 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I2bdfbd258e (duration: 00m 57s)
  • 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I069c7b53 (duration: 00m 58s)
  • 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
  • 15:10 godog: +100G to prometheus/ops in codfw
  • 14:59 vgutierrez: rolling restart of eqiad pybal instances
  • 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:42 vgutierrez: rolling restart of codfw pybal instances
  • 14:33 vgutierrez: rolling restart of eqsin pybal instances
  • 14:23 vgutierrez: rolling restart of ulsfo pybal instances
  • 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
  • 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
  • 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
  • 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
  • 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
  • 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
  • 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
  • 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
  • 11:40 moritzm: installing bluez security updates
  • 11:31 Lucas_WMDE: EU backport+config window done
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add config for updated PropertySuggester beta cluster (T285098) (beta-only) (duration: 00m 57s)
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 09:52 moritzm: imported megacli for bullseye-wikimedia T282272 T275873
  • 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
  • 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
  • 08:15 vgutierrez: depool codfw text traffic
  • 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
  • 03:26 twentyafterfour: restarted phd on phab1001
  • 03:25 twentyafterfour: investigating PHD failure

2021-07-16

  • 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
  • 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
  • 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P{elastic2*}' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
  • 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
  • 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
  • 15:48 vgutierrez: restart pybal on lvs2010
  • 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 15:24 godog: downtime flappy pages in codfw for 40 minutes
  • 15:14 godog: set alert2001 as active in netbox (was staged) - T247966
  • 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
  • 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
  • 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw T286787
  • 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
  • 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
  • 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
  • 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
  • 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
  • 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers (T279309)
  • 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:39 mutante: mw1412 through mw1428 - set to active in netbox (T279309)
  • 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
  • 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
  • 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
  • 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
  • 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
  • 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
  • 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
  • 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
  • 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
  • 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures T286763', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
  • 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied T132839 workarounds)

2021-07-15

  • 23:32 brennen: checking stashbot: T286756
  • 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: Fix creation of mw.Message objects (T286385) (duration: 00m 57s)
  • 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # T285811
  • 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # T285811
  • 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # T285811
  • 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
  • 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki T284928
  • 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
  • 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
  • 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 06s)
  • 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 07s)
  • 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
  • 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
  • 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
  • 16:40 ejegg: updated payments-wiki from d9892207c1 to 844b59ee42
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 16:27 ejegg: updated fundraising CiviCRM from e0d53c92b5 to 20e9ef6bbb
  • 16:24 ejegg: updated payments-wiki from 0e7800027a to 844b59ee42
  • 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Allow admins of idwiki to change stablesettings (T268317), try II (duration: 01m 05s)
  • 15:03 Amir1: temporary becoming admin on idwiki to debug T268317
  • 15:02 moritzm: installing nginx security updates on ms-fe*
  • 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
  • 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
  • 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
  • 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
  • 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
  • 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
  • 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
  • 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
  • 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
  • 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
  • 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
  • 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
  • 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
  • 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
  • 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
  • 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
  • 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make idwiki use protect mode of flaggedrevs (T268317) (duration: 01m 07s)
  • 11:40 moritzm: restarting Etherpad to pick up libuv security update
  • 11:37 moritzm: restarting Turnilo to pick up libuv security update
  • 11:34 moritzm: installing libuv1 security updates
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - T285835
  • 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - T272128
  • 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:02 effie: disableing puppet on maps* for 704394
  • 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
  • 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
  • 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
  • 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 07:48 moritzm: updated bullseye d-i image for latest daily build T275873
  • 07:31 godog: reimage thanos-fe2001 with bullseye - T285835
  • 07:23 elukey: restart planet-update-en.service on planet1002
  • 07:17 elukey: remove /etc/rawdog/en/{state,state.lock} on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
  • 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
  • 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 06s)
  • 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 07s)
  • 05:50 kart_: Updated cxserver to 2021-07-14-124232-production (T282369, T284450)
  • 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 00:00 twentyafterfour: phabricator update deployed.

2021-07-14

  • 23:23 eileen: civicrm revision changed from b1c63470bb to e0d53c92b5, config revision is bb405c5232
  • 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
  • 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
  • 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
  • 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: Fix deprecated offset() on invalid DOM (T185629) (duration: 01m 07s)
  • 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
  • 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
  • 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki T284456
  • 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
  • 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource T284390
  • 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 06s)
  • 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 05s)
  • 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: TranslationAid: Handle empty message definition (T285830) and TranslationAid: Make sure to return successfully fetched definitions (T285830) (duration: 01m 09s)
  • 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:37 moritzm: installing klibc security updates
  • 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
  • 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P{elastic*}' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
  • 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
  • 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:28 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
  • 14:51 moritzm: installing apache security updates on puppet masters
  • 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
  • 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - T286463
  • 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:44 moritzm: installing apache security updates on grafana*
  • 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
  • 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
  • 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
  • 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:13 elukey: restart php-fpm on mw2370
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
  • 12:43 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:15 mutante: mw1422 - scap pull
  • 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
  • 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 11:52 mutante: mw1422 - new setup, not in prod yet
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: Remove reviewer user group in ruwiki (T284589) (duration: 01m 05s)
  • 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Reduce levels for ruwiki to 1 (T284589) (duration: 01m 05s)
  • 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 72027e1: Disable indexing in NS_USER and NS_USER_TALK on bnwiki (T286152) (duration: 02m 07s)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4dc11d2: Change category name of Babel extension on Javanese Wikipedia (T286165) (duration: 02m 10s)
  • 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # T285811
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 00:58 eileen: process control updated to c291b3c
  • 00:58 eileen: c291b3c
  • 00:49 eileen: civicrm revision changed from bb62188ec6 to b1c63470bb, config revision is c291b3c689
  • 00:48 eileen: process-control config revision is c291b3c689
  • 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)

2021-07-13

  • 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 08s)
  • 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 07s)
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
  • 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: links is flat array (T286040) (duration: 02m 07s)
  • 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
  • 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
  • 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
  • 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
  • 17:45 mutante: mw1283 - decom - powered off by cookbook
  • 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
  • 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - T280203"
  • 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:09 mutante: mw1282 - decom, powered off
  • 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
  • 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: Do not lock user_preferences before updating (T286521) (duration: 01m 58s)
  • 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:55 jbond: upload statograph to buster wikimedia
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
  • 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
  • 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
  • 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
  • 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483)
  • 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232 (duration: 03m 28s)
  • 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232
  • 13:37 effie: rolling restart php-fpm across clusters - T286260
  • 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260) (duration: 00m 58s)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:14 kormat: restarted replication on db1117:3325 T284622
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:53 kormat: stopping replication on db1117:3325 T284622
  • 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - T280203
  • 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
  • 12:20 mutante: mwmaint1002 - scap pull after reimaging
  • 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:28 Lucas_WMDE: EU backport+config window done
  • 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove obsolete $wgShowDBErrorBacktrace config (duration: 01m 25s)
  • 11:13 mutante: mwmaint1002 - reimaging with buster (T267607)
  • 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed (T267607)
  • 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan: running `nodetool decommission` on maps2008
  • 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:18 moritzm: installing apache security updates on Logstash hosts
  • 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
  • 09:40 moritzm: installing apache security updates on thanos-fe hosts
  • 09:38 moritzm: installing apache security updates on parsoid hosts
  • 09:31 effie: depool mw2383 T286463
  • 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:45 effie: depool mw2383 - T286463
  • 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
  • 07:06 moritzm: installing apache security updates on codfw mw* hosts
  • 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - T273026
  • 06:06 effie: pool mw2383 - T286463
  • 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
  • 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
  • 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
  • 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`

2021-07-12

  • 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1896efc: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T286163) (duration: 00m 56s)
  • 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=T286396 # T286396
  • 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php (T286396)
  • 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 284216a: Add few namespace aliases for Serbian Wikipedia (T286396) (duration: 00m 56s)
  • 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8a79bf7: enwiki: Delete Book namespace (T285766) (duration: 00m 57s)
  • 23:29 urbanecm@deploy1002: Synchronized static/images/: d007b9c: Remove unused celebration logos and wordmark (T286380) (duration: 00m 57s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6c58149: Add editautoreviewprotected to bot on hewikisource (T275076) (duration: 00m 57s)
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 40eade4: Enable RelatedArticles Extension in zhwikinews (T266933) (duration: 00m 57s)
  • 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # T286101, P16817
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ab00d1: zhwiktionary: Add templateeditor right (T286101) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5822b2b: zhwiktionary: Add aliases for namespaces (T286101) (duration: 00m 57s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba0967f: zhwiktionary: Add Reconstruction namespace (T286101) (duration: 00m 57s)
  • 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
  • 21:26 urbanecm: Start server-side upload for 2 video files (T286432, T286433)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232 (duration: 03m 39s)
  • 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232
  • 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki (T257066) (duration: 00m 58s)
  • 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392 (duration: 21m 24s)
  • 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392
  • 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
  • 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
  • 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
  • 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
  • 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
  • 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232 (duration: 03m 30s)
  • 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232
  • 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232 (duration: 03m 16s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232
  • 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232 (duration: 03m 37s)
  • 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232
  • 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - T282484
  • 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
  • 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable template search improvements on first wikis 2/2 (T284553) (duration: 00m 57s)
  • 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable template search improvements on first wikis 1/2 (T284553) (duration: 00m 56s)
  • 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: Always add 1 prefixsearch match when searching for templates (duration: 00m 57s)
  • 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
  • 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
  • 11:40 moritzm: installing apache updates on mw1/eqiad hosts
  • 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
  • 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 773c956: Revert "Use ptwiki 20th anniversary logos" (T286380) (duration: 00m 57s)
  • 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
  • 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cd5f537: Revert "ptwiki: Use celebration logos in new vector" (T286380) (duration: 00m 57s)
  • 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add 'editautoreviewprotected' protection level to hewikisource (T275076) (duration: 00m 57s)
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
  • 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable transclusion back button on first wikis (T284553) (duration: 00m 58s)
  • 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
  • 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
  • 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
  • 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for T285927
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - T285251
  • 10:03 effie: depool mw2383
  • 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 10:01 godog: test thanos-compact upload with smaller part size - T285835
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
  • 09:07 godog: repool thanos-fe2002 - T285835
  • 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - T285835
  • 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: Remove subscribing to other aspect for entity usage (T286193) (duration: 00m 59s)
  • 07:44 jynus: restart db1102:x1 mariadb instance
  • 07:01 moritzm: installing apache2 security updates
  • 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish (T275268)
  • 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Enable json image metadata everywhere (T275268) (duration: 01m 05s)
  • 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: Add --sleep option to refreshImageMetadata.php (duration: 01m 04s)
  • 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force (T275268)
  • 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Set testcommonswiki to use json image metadata (T275268) (duration: 01m 10s)

2021-07-09

  • 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 22:36 legoktm: running benchmarking scripts again shellbox
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232 (duration: 03m 08s)
  • 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
  • 11:40 _joe_: deleting coredns pod in codfw, potentially causing T286360
  • 10:13 _joe_: recreated all pods for zotero in codfw
  • 00:47 legoktm: zotero rolling restart didn't help, filed T286360 for DNS issues
  • 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues

2021-07-08

  • 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - T281423 (duration: 00m 57s)
  • 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - T281423 (duration: 00m 58s)
  • 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
  • 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
  • 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
  • 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
  • 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
  • 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
  • 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232 (duration: 03m 06s)
  • 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232
  • 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232 (duration: 05m 27s)
  • 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232
  • 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232 (duration: 05m 42s)
  • 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232
  • 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - T271232 T273901 (duration: 01m 09s)
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
  • 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232 (duration: 03m 22s)
  • 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
  • 12:52 moritzm: installing klibc security updates on buster
  • 12:38 moritzm: installing openexr security updates
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
  • 10:20 jbond: upgrade golang-cfssl
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
  • 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
  • 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
  • 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 T284811
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json

2021-07-07

  • 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
  • 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to {Production,Labs}Services.php (2/2) (duration: 00m 59s)
  • 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to {Production,Labs}Services.php (1/2) (duration: 00m 59s)
  • 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 05m 28s)
  • 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
  • 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232
  • 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 17m 22s)
  • 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232
  • 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
  • 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
  • 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
  • 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
  • 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:49 moritzm: installing djvulibre security updates
  • 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
  • 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232 (duration: 05m 41s)
  • 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232 (duration: 03m 11s)
  • 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232
  • 12:12 urbanecm: Start server-side upload for 3 video files (T286173, T286175, T286174)
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
  • 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
  • 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-06

  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
  • 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
  • 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
  • 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
  • 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
  • 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
  • 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
  • 10:19 moritzm: installing jackson-databind security updates on buster
  • 09:01 _joe_: repooling wdqs1007 now that lag has caught up
  • 08:43 moritzm: installing libuv1 security updates on buster
  • 07:06 marostegui: Upgrade db1104 kernel
  • 06:54 moritzm: installing PHP 7.3 securiy updates on buster
  • 06:50 marostegui: Upgrade db1122 kernel
  • 06:35 marostegui: Upgrade db1138 kernel
  • 06:31 marostegui: Upgrade db1160 kernel
  • 00:56 eileen: process-control config revision is 8d46b52ed4

2021-07-05

  • 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image (T286212)
  • 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
  • 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
  • 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 T286042
  • 11:29 moritzm: installing openexr security updates on stretch
  • 11:07 moritzm: installing tiff security updates on stretch
  • 10:48 moritzm: upgrading PHP on miscweb*
  • 10:37 jbond: enable puppet fleet wide to post puppetdb change
  • 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication T286102
  • 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
  • 08:15 moritzm: rolling out debmonitor-client 0.3.0
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:04 _joe_: restarting blazegraph, then restarting the updater again
  • 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
  • 06:47 _joe_: restart wdqs-updater on wdqs1007
  • 00:53 eileen: process-control config revision is a1717c7fde
  • 00:47 eileen: process-control config revision is 24565578f7

2021-07-04

2021-07-03

  • 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - T286113
  • 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for T283659
  • 08:53 Amir1: patching postorius and mailmanclient on lists1001 for T283659

2021-07-02

  • 22:06 foks: removing three files for legal compliance
  • 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
  • 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
  • 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
  • 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
  • 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
  • 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
  • 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
  • 13:11 mutante: mw2380 - rebooting
  • 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 12:24 moritzm: added btullis to pwstore
  • 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run T285603
  • 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
  • 11:28 mutante: powercycling mw2380, trying to make it boot
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix 3fd2873 for T285579 (duration: 00m 59s)
  • 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
  • 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Fix handling of geEnabled flag (T285996) (duration: 00m 57s)
  • 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
  • 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - T285835
  • 09:19 dcausse: restart blazegraph on wdqs1013
  • 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
  • 09:04 mutante: decom'ing mw1267
  • 09:02 moritzm: installing node-hosted-git-info security updates
  • 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
  • 08:54 moritzm: installing golang-docker-credential-helpers security updates
  • 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:03 moritzm: installing ipmitool security updates
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
  • 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
  • 03:14 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
  • 03:11 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
  • 03:07 ryankemper: T264053 `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
  • 03:07 ryankemper: T264053 `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
  • 03:06 ryankemper: T264053 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
  • 03:05 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - T264053"'`
  • 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
  • 01:47 eileen: civicrm revision changed from e07c2be1a7 to bb62188ec6, config revision is 1739c53fcb
  • 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o (T264053)

2021-07-01

  • 23:29 thcipriani@deploy1002: Synchronized README: Config: Revert "deployment training: readme whitespace" (duration: 00m 56s)
  • 23:21 thcipriani@deploy1002: Synchronized README: Config: deployment training: readme whitespace (duration: 00m 57s)
  • 22:37 urbanecm: Start server-side upload for 1 video file (T285182)
  • 22:36 urbanecm: Start server-side upload for 1 video file (T285789)
  • 22:31 dancy@deploy1002: Synchronized .pipeline: Config: Use train-versions.json to map from version to image tag (T282824) (duration: 00m 57s)
  • 22:27 urbanecm: Start server-side upload for 1 video file (T285682)
  • 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: Temporarily disable notification for security patch failures (duration: 00m 57s)
  • 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
  • 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
  • 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: Trigger update-train-versions job at end of wmf-publish pipeline (duration: 01m 08s)
  • 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
  • 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
  • 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7995f7a: Use Vue.js for QuickSurveys on available wikis (T285890) (duration: 01m 09s)
  • 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 654877f: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 12s)
  • 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 6d90430: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 14s)
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
  • 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: T285959 (duration: 01m 20s)
  • 16:11 vgutierrez: restart varnish-fe on cp3059 - T285953
  • 14:58 papaul: poweroff mw2380 for disk replacement
  • 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 14:53 effie: depool mw2380 for disk repair - T285603
  • 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:45 moritzm: installing glib2.0 security updates on buster
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
  • 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
  • 13:03 marostegui: Deploy schema change on s2 eqiad master T276150
  • 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
  • 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
  • 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
  • 12:23 tgr: EU deploys done
  • 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 08s)
  • 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 09s)
  • 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
  • 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
  • 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472) (duration: 01m 15s)
  • 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
  • 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:35 marostegui: Deploy schema change on s8 eqiad master T276150
  • 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
  • 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Avoid using MWNamespace (duration: 01m 06s)
  • 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:05 moritzm: installing remaining libgcrypt20 security updates
  • 09:56 moritzm: installing remaining gnutls28 security updates
  • 09:55 Amir1: start of clean up of autoreview logs in ruwiki (T285608)
  • 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master T277123
  • 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master T277123
  • 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
  • 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
  • 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
  • 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
  • 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
  • 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
  • 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master T277123
  • 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters T277123
  • 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters T277123
  • 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) T277123
  • 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) T277123
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
  • 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8

2021-06-30

  • 23:28 urbanecm: Evening B&C window finished
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 667d880: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
  • 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
  • 21:43 Amir1: deleting auto-review logs from test2wiki (T285608)
  • 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 21:29 cstone: civicrm revision changed from 789c92d13b to e07c2be1a7
  • 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
  • 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
  • 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
  • 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
  • 17:57 thcipriani: restart ci jenkins following upgrade
  • 17:54 thcipriani: restart releases-jenkins following upgrade
  • 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci T285532
  • 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per phab:T285866' # T285866
  • 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
  • 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
  • 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource (T284389) (duration: 01m 20s)
  • 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource (T284389) (duration: 01m 16s)
  • 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource (T284389) (duration: 01m 17s)
  • 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource (T284389)
  • 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource (T284389) (duration: 01m 17s)
  • 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource (T284389) (duration: 01m 14s)
  • 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource (T284389) (duration: 01m 13s)
  • 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki (T284885) (duration: 01m 13s)
  • 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki (T284885) (duration: 01m 15s)
  • 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki (T284885)
  • 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki (T284450) (duration: 01m 12s)
  • 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki (T284450) (duration: 01m 14s)
  • 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki (T284450)
  • 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # T284450
  • 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki (T284450) (duration: 01m 13s)
  • 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
  • 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
  • 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 13:26 moritzm: installing fluidsynth security updates on stretch
  • 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 13:04 mutante: switching docker-registry to nginx light variant T164456
  • 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
  • 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
  • 12:17 kart_: Updated cxserver to 2021-06-30-112813-production (T284900, T284885)
  • 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:46 Lucas_WMDE: EU backport+config window done
  • 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
  • 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (1/3, prod) (duration: 01m 16s)
  • 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
  • 11:11 moritzm: installing libgcrypt security updates on buster
  • 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
  • 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client repoConceptBaseUri (T257260) (duration: 01m 24s)
  • 10:44 moritzm: installing gnutls security updates on buster
  • 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
  • 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - T162123
  • 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
  • 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
  • 08:31 godog: remove sdf1 from thanos-be1003 in swift - T285835
  • 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
  • 07:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 05:46 ryankemper: [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw`
  • 00:36 tstarling@deploy1002: Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:35 tstarling@deploy1002: Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:34 tstarling@deploy1002: Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:32 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:31 tstarling@deploy1002: Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:27 tstarling@deploy1002: Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s)
  • 00:01 urbanecm: (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of T285819

2021-06-29

  • 23:45 urbanecm: Evening B&C window done
  • 23:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 367bc98: 904d18720: flood flag changes for enwikibooks (T285594) (duration: 01m 07s)
  • 23:45 urbanecm: Remove TrainBranchBot from wmf-deployment Gerrit group, merges code to mediawiki-config without actually deploying it
  • 23:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 8a5b835: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 04s)
  • 23:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: c61fb17: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 05s)
  • 23:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 05s)
  • 23:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 06s)
  • 23:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: e77e002: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 09s)
  • 21:58 maryum: deployed security patch T285515 to wmf.12
  • 21:51 maryum: deployed security patch T285515 to wmf.11
  • 21:44 maryum: deployed updated security patch for T285190 to wmf.12
  • 21:42 maryum: deployed updated security patch for T285190 to wmf.11
  • 21:31 sbassett: Reverted and deployed updated security patch for T285190 to wmf.12
  • 21:29 sbassett: Reverted and deployed updated security patch for T285190 to wmf.11
  • 21:19 sbassett: Deployed updated security patch for T285190 to wmf.11
  • 20:55 dancy: Deleted all CDB files on beta so they'll be recreated on the next scap sync-world run
  • 20:26 dancy: Reverting to scap 3.17.1-1+0~20210419163335.8~1.gbpa6b2e0 in beta
  • 19:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:34 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc3
  • 18:28 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
  • 18:21 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc1
  • 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:07 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.7 (duration: 04m 00s)
  • 17:59 urbanecm: Start server-side upload of ~2.5G of JPG files (T282755)
  • 17:52 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.12 (duration: 57m 11s)
  • 16:55 ryankemper: T281327 `[Cirrus -> codfw]` Current banned nodes are`elastic2043` and `elastic2045`; `elastic2043` can be unbanned after a re-image, and `elastic2045` can be unbanned in ~30 minutes after shards rebalance (had heavy shards scheduled)
  • 16:55 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.12
  • 16:45 brennen: 1.37.0-wmf.12 was branched at 3703c31 for T281153
  • 16:28 ebernhardson: temporarily ban elastic2045 from production-search-codfw
  • 15:43 dcausse: unbanning elastic2054
  • 15:30 dcausse: restarting blazegraph on wdqs1012
  • 15:17 effie: pool mw2383 back
  • 15:15 mutante: [mwlog2002:~] $ sudo systemctl start mw-log-cleanup
  • 15:06 dcausse: banning elastic2054
  • 14:53 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[1-2].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[8-9].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw225[1-2].codfw.wmnet,service=canary
  • 14:52 effie: depool mw2383 as it is misbehaving
  • 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:47 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw226[1-2].codfw.wmnet
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2290.codfw.wmnet
  • 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:46 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw22[7-8][0-9].codfw.wmnet
  • 14:45 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet
  • 14:44 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:44 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet,service=api_appserver
  • 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 14:38 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 14:38 _joe_: restarting pohp-fpm on mw2383
  • 14:38 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:37 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2103 (s1) weight a bit', diff saved to https://phabricator.wikimedia.org/P16739 and previous config saved to /var/cache/conftool/dbconfig/20210629-143742-marostegui.json
  • 14:37 _joe_: repooling mw2383
  • 14:36 _joe_: depooling mw2383
  • 14:30 legoktm@deploy1002: Synchronized wmf-config/db-codfw.php: fix trwikivoyage (duration: 01m 01s)
  • 14:29 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:28 Krinkle: TODO: Don't duplicate `sectionsByDB` between db-* files
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: MediaWiki read-only period ends at: 2021-06-29 14:23:23.504447
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:21 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: MediaWiki read-only period starts at: 2021-06-29 14:21:26.671853
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:15 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:15 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:11 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:10 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:09 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:08 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:02 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:01 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:01 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:51 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2] (duration: 00m 07s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 17m 42s)
  • 13:35 volker-e@deploy1002: Finished deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476) (duration: 00m 07s)
  • 13:34 volker-e@deploy1002: Started deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
  • 11:54 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: vector: Finish enabling language switcher treatment A/B test on fawiki (T269093) (duration: 00m 56s)
  • 11:38 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part II (duration: 00m 58s)
  • 11:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/includes/Rdf/PropertyStubRdfBuilder.php: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part I (duration: 00m 56s)
  • 11:35 ladsgroup@deploy1002: sync-file aborted: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634) (duration: 00m 10s)
  • 10:30 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on acmechief* after switch towards nginx-light T164456
  • 09:27 moritzm: installing nettle security updates on buster
  • 08:47 elukey: repool mw13[55,84] after debugging - T285634
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
  • 08:43 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 08:25 elukey: cumin 'A:mw-eqiad' '/usr/local/sbin/restart-php7.2-fpm' -b 2 -s 30 - T285634
  • 08:21 elukey: depool mw1355 (mw appserver) for debugging - T285634
  • 08:21 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
  • 08:12 hashar: Upgrading Jenkins on contint2001 / contint1001 and restarting CI Jenkins # T285531
  • 08:03 hashar: Upgraded Jenkins on releases1002 / releases2002 # T285531
  • 08:02 hashar: Upgraded Jenkins on releases1002 / releases2002
  • 07:50 godog: remove 20G migration data /root/prometheus from prometheus4001 - T243057
  • 07:48 godog: remove old /root/prometheus data from prometheus4001
  • 07:05 moritzm: upgrading bullseye early installs to the latest state of testing T275873
  • 06:46 tstarling@deploy1002: Synchronized php-1.37.0-wmf.11/includes/MediaWiki.php: Add statsd action timing metric T284274 (duration: 00m 58s)
  • 02:47 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload and A:codfw' 'run-puppet-agent -q'
  • 02:34 ryankemper: T285643 Banned `elastic1039` from all 3 elasticsearch clusters and set `elastic1039.eqiad.wmnet` to failed in netbox
  • 02:27 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload' 'run-puppet-agent -q'
  • 02:25 eileen: civicrm revision changed from 927ab7cff7 to 789c92d13b, config revision is 1739c53fcb
  • 02:04 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0e916b1]: 0.3.75 (duration: 08m 40s)
  • 01:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.75` on canary `wdqs1003`; proceeding to rest of fleet
  • 01:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0e916b1]: 0.3.75
  • 01:50 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.75`. Pre-deploy tests passing on canary `wdqs1003`
  • 00:25 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref T282761

2021-06-28

  • 23:07 urbanecm: Evening B&C window done
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ec855d: Enable Parsoid inspired media structure on test wikis (T51097) (duration: 00m 59s)
  • 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 22:51 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 22:50 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:48 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:48 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 22:44 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 22:43 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-06-28 22:43:04.512602
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:41 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-28 22:41:41.222740
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 22:40 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:38 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:38 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:32 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 22:31 legoktm: starting DC switchover live test, which will "switch" us from codfw -> eqiad
  • 22:28 eileen: civicrm revision changed from 9d1203fb28 to 927ab7cff7, config revision is 1739c53fcb
  • 22:09 legoktm: live-hacked spicerack on cumin1001 to ignore x2, see https://phabricator.wikimedia.org/T285519#7182377
  • 21:55 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc2, ref T282761
  • 20:03 cstone: payments-wiki revision is d9892207c1
  • 19:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/maintenance/: I618bc1 (duration: 00m 56s)
  • 19:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/libs/objectcache/: T282761 - I618bc1 (duration: 00m 56s)
  • 19:45 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/objectcache/SqlBagOStuff.php: T282761 - I618bc1 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1002: Synchronized wmf-config/: T281515: Prepare Cirrus more_like for dc switchover (duration: 01m 02s)
  • 18:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 3/3) (duration: 00m 55s)
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HelpPanelHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 2/3) (duration: 00m 55s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 1/3) (duration: 00m 58s)
  • 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor/: 794a46c: Hotfix for broken "Extract show all to placeholder class" (T284636; T285571) (duration: 00m 57s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4ae0fdd: Enable DiscussionTools topicsubscription as beta feature on partner wikis (T274280) (duration: 00m 57s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b59184: Remove redundant wgDiscussionToolsEnable overrides (duration: 00m 56s)
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1043c93: Growth: Enable community configuration at all Growth wikis (T285423) (duration: 00m 56s)
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 sukhe: Traffic: depool eqiad from user traffic
  • 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-.*,name=eqiad
  • 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:08 jayme@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:07 gehel: restarting wdqs-updater on all wdqs hosts for new configuration
  • 14:54 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:53 jayme@cumin1001: Switching services swift, proton, mathoid, restbase, swift-ro, eventstreams, search, shellbox, eventgate-analytics-external, wdqs-internal, kartotherian, api-gateway, termbox, mobileapps, similar-users, wikifeeds, apertium, restbase-async, eventgate-main, eventgate-logging-external, ores, sessionstore, linkrecommendation, echostore, push-notifications, citoid, zotero, eventgate-analytics, wdqs, eventstreams-i
  • 14:53 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:37 jayme@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=99)
  • 14:36 jayme@cumin1001: Switching services kartotherian, proton, wdqs-internal, wikifeeds, zotero, recommendation-api, swift-ro, linkrecommendation, mobileapps, citoid, eventgate-analytics, push-notifications, eventstreams-internal, mathoid, similar-users, schema, apertium, restbase-async, shellbox, termbox, wdqs, ores, eventgate-analytics-external, swift, helm-charts, restbase, cxserver, search, sessionstore, eventstreams, api-gate
  • 14:36 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:35 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:21 effie: restarted mw[1322,1329,1333,1350,1351,1352,1353,1354,1366,1367,1368,1370,1372,1373]
  • 14:07 effie: restarting busy php-fpm app servers
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (2/2, beta) (duration: 00m 57s)
  • 13:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (1/2, prod) (duration: 00m 57s)
  • 12:59 moritzm: installing intel-microcode security updates on buster
  • 12:30 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/MediaHandler.php: Backport: media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully (T285490) (duration: 00m 56s)
  • 12:24 dcausse: repool wdqs1012
  • 12:00 Lucas_WMDE: EU backport+config window done
  • 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase repo foreignRepositories (T257260) (duration: 00m 55s)
  • 11:40 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-eqiad
  • 11:38 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-codfw
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4a088f: vector: Enable language switcher treatment A/B test on fawiki (T269093) (duration: 00m 55s)
  • 11:28 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/signup/campaign.less: cd16aa2: Donor campaign: fix signup page styling (T284740) (duration: 00m 56s)
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9495d18: GrowthExperiments: Update campaign pattern (T284800) (duration: 00m 56s)
  • 11:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ scap pull # did not print any errors
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ade641b: Deploy ContentTranslation out of Beta feature in 9 WPs (T284641) (duration: 00m 56s)
  • 10:44 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:43 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:25 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:23 mutante: sodium - restarted nginx
  • 10:23 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:22 mutante: sodium (mirrors.wikimedia.org) - switching to nginx light variant T164456
  • 10:11 vgutierrez: rolling upgrade of ATS on eqiad - T285535
  • 10:11 moritzm: installing remaining libxml2 security updates
  • 09:52 vgutierrez: rolling upgrade of ATS on esams - T285535
  • 09:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (2/2, beta) (duration: 00m 56s)
  • 09:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (1/2, prod) (duration: 00m 57s)
  • 09:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 09:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 09:39 Lucas_WMDE: ^ wrong gerrit change used for message, sorry
  • 09:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Config: Stop setting Wikibase repo foreignRepositories (T257260) (1/2, prod) (duration: 00m 10s)
  • 09:27 vgutierrez: rolling upgrade of ATS on eqsin - T285535
  • 09:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client changesDatabase (T257260) (duration: 00m 55s)
  • 08:56 vgutierrez: rolling upgrade of ATS on codfw - T285535
  • 08:53 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part II (duration: 00m 57s)
  • 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part I (duration: 00m 56s)
  • 08:48 mutante: phab1001 - removing 2fa for my own account
  • 08:40 vgutierrez: rolling upgrade of ATS on ulsfo - T285535
  • 08:40 jayme: drain kubestage2002 for docker restart(s)
  • 08:33 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part II (duration: 00m 55s)
  • 08:31 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part I (duration: 00m 58s)
  • 08:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove special configurations for Dagbani in Wikibase code (T283168) (duration: 00m 56s)
  • 08:25 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:23 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:21 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set Wikidata's main sandbox item (T219215), Part II (duration: 00m 56s)
  • 08:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata's main sandbox item (T219215), Part I (duration: 00m 57s)
  • 08:19 jynus: stop and remove db1145:s5 db2099:s5 T283235
  • 07:58 dcausse: depool and restart blazegraph on wdqs1012
  • 07:57 jelto: jelto@cumin1001:~$ sudo cumin install* 'run-puppet-agent' # update DHCP entry for gitlab2001 on install[1003,2003,3001,4001,5001].wikimedia.org
  • 07:57 dcausse: repool wdqs1005
  • 07:46 hashar@deploy1002: Finished deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana (duration: 00m 08s)
  • 07:46 hashar@deploy1002: Started deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana
  • 06:16 XioNoX: remove BGP to AS13768 in AMS-IX

2021-06-27

  • 09:10 elukey: cumin 'A:mw-eqiad and not P{mw13[67,54,55,72,33,50,51,73,52,49,53,65,71,84,68,70,66,91,89,97,95,99,85,93,87]*} and not P{mw14[09,03,11,07,05,01]*} and not P{mw12[61-69]*} and not P{mwdebug*}' '/usr/local/sbin/restart-php7.2-fpm' -b 1 -s 30
  • 09:10 elukey: roll restart the remaining mw appservers to clear out apcu framentation (cumin command to follow)
  • 08:58 elukey: slow roll restart (cumin -b 1 -s 30) of mw126[1-7]'s php-fpm (75-80% of apcu fragmentation)
  • 08:37 elukey: restart php-fpm on mw1268 mw1269 - low idle workers
  • 08:23 elukey: restart php-fpm on mw1401

2021-06-26

  • 21:28 volans: upgraded spicerack to v0.0.56 on the cumin hosts (includes only bug fixes for the switchdc)
  • 21:23 volans: uploaded spicerack_0.0.56 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:37 elukey: restart php-fpm on mw1387
  • 15:43 elukey: restart php-fpm on mw1393
  • 15:39 elukey: restart php-fpm on mw1405 mw1399 mw1385
  • 15:37 elukey: restart php-fpm on mw1397 mw1395 mw1411 mw1407
  • 15:31 elukey: restart php-fpm on mw1391 mw1389 mw1403
  • 13:49 elukey: restart php-fpm on mw1368 mw1370 mw1366 mw1409
  • 13:43 elukey: depool mw1384 for investigation
  • 13:43 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
  • 13:33 elukey: restart phpfpm on mw1353 mw1365 mw1371
  • 13:30 elukey: restart php-fpm on mw1351 mw1373 mw1352 mw1349
  • 13:23 elukey: restart-phpfpm on mw1350 (0 idle php workers)
  • 13:20 elukey: restart-phpfpm on mw1333 (0 idle php workers)
  • 10:08 elukey: restart php-fpm on mw1372 - T285593
  • 10:07 elukey: restart php-fpm on mw1372 - T285593
  • 09:45 elukey: restart php-fpm on mw135[4-5]
  • 09:44 elukey: restart php-fpm on mw1354
  • 09:38 elukey: reboot mw1414 (not reachable via ssh, nor via mgmt console)
  • 09:33 elukey: restart php-fpm on mw1367 (php fatal memory errors, php7adm /apcu-frag returns errors)

2021-06-25

  • 21:37 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/: cirrus: Revert "Stop querying ores_articletopic" (3/3) (duration: 01m 01s)
  • 21:35 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Wikimedia/WeightedTagsHooks.php: cirrus: Revert "Stop querying ores_articletopic" (2/3) (duration: 00m 58s)
  • 21:34 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Parser/FullTextKeywordRegistry.php: cirrus: Revert "Stop querying ores_articletopic" (1/3) (duration: 00m 58s)
  • 20:32 legoktm: legoktm@mwmaint1002:~$ sudo systemctl reset-failed # to clear icinga alert
  • 20:28 legoktm: legoktm@mwmaint1002:~$ sudo systemctl start mediawiki_job_update_special_pages.service (T285583)
  • 20:21 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.SuggestedEdits.js: eaec745: SuggestedEdits: Only log task impression for EditCardWidget (T283546; emergency deployment) (duration: 01m 00s)
  • 18:08 legoktm: legoktm@ms-fe2005:~$ sudo systemctl unmask swiftrepl-mw.service
  • 15:46 mutante: mw1326, mw1327, mw1328, mw1329 ... restarted php-fpm
  • 15:41 mutante: mw1330, mw1320, mw1321, mw1322 - restarted php-fpm
  • 15:38 mutante: [mw1330:~] $ sudo restart-php7.2-fpm
  • 15:36 mutante: [mw1332:~] $ sudo restart-php7.2-fpm
  • 15:28 mutante: [mw1319:~] $ sudo restart-php7.2-fpm
  • 15:20 rzl: rzl@mw1320:~$ sudo restart-php7.2-fpm # workers stuck since the ~14:00 request spike
  • 15:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab2001.wikimedia.org
  • 14:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 13:50 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab2001.wikimedia.org
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 13:08 vgutierrez: update ATS to version 8.0.8-1wm4 on cp4026 and cp4032 - T285535
  • 13:06 vgutierrez: upload trafficserver 8.0.8-1wm4 to apt.wm.o (buster) - T285535
  • 12:28 moritzm: installing nmap bugfix update from Buster point release
  • 12:28 moritzm: installing nmal bugfix update from Buster point release
  • 11:28 moritzm: installing 4.19.194 kernels on Buster from latest 10.10 point release (no reboots, just rolling out the packages)
  • 09:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:15 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:13 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:13 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 08:55 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 08:54 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 08:48 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 08:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 08:01 elukey: reboot an-worker1101 to unblock stuck GPU
  • 08:00 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 08:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 07:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 07:42 moritzm: imported Jenkins 2.289.1 to thirdparty/ci for buster-wikimedia T285531
  • 07:30 dcausse: depool and restart blazegraph on wdqs1005
  • 07:17 dcausse: installing openjdk-8-dbg on wdqs1005 to debug blazegraph

2021-06-24

  • 23:02 legoktm: reverted cumin1001 spicerack live hacks
  • 22:57 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:36 volans: set x2 codfw master back to RW
  • 22:30 legoktm@cumin1001: END (ERROR) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=97)
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:29 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:29 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-24 22:29:25.643909
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 22:09 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:05 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:04 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:04 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:01 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 21:59 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:47 legoktm: live hacked spicerack on cumin1001 to revert https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/700963/
  • 20:58 legoktm: starting dry run and live test of DC switchover
  • 20:53 legoktm: legoktm@phab1001:~$ sudo /srv/phab/phabricator/bin/remove destroy M320 (spam)
  • 20:44 volans: uploaded spicerack_0.0.55 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 20:28 legoktm: re-enabled daily digests for wikimedia-l - T285486
  • 19:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.11
  • 19:07 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:04 dduvall: preparing to roll group2 to 1.37.0-wmf.11 (T281152) (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 17:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 17:08 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 16:12 twentyafterfour: restarted php7.3-fpm on phab1001
  • 15:43 hnowlan: running `nodetool decommission` on maps2007
  • 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 15:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:31 moritzm: installing jackson-databind security updates
  • 15:26 moritzm: installing ruby-websocket-extensions security updates
  • 15:02 hnowlan: reenabling puppet on P{C:Postgresql::Slave}
  • 14:59 moritzm: restarting mw canaries to pick up libxml2 security update
  • 14:57 moritzm: installing libxml2 security updates on buster
  • 14:46 hnowlan: Disabling puppet on P{C:Postgresql::Slave} (netboxdb2001,puppetdb2002, most maps hosts) to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/700071
  • 13:29 volans: uploaded python3-wmflib_0.0.8 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 12:45 tgr: EU deploys done
  • 12:44 tgr@deploy1002: Finished scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281) (duration: 26m 07s)
  • 12:18 tgr@deploy1002: Started scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281)
  • 12:08 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 2 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:53 jayme: import dragonfly_1.0.6-1 into buster-wikimedia
  • 11:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:44 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:37 jayme: depooling registry2008 for some dragonfly testing
  • 11:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry2008.codfw.wmnet,dc=codfw,cluster=docker-registry
  • 11:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 1 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:25 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update $wgNamespacesToBeSearchedDefault for wikimania (T284793) (duration: 01m 07s)
  • 11:21 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable OCR tool on all Wikisources (T285311) (duration: 01m 06s)
  • 11:11 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable link recommendation feature for more wikis (T284481) (duration: 01m 07s)
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16723 and previous config saved to /var/cache/conftool/dbconfig/20210624-092226-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16722 and previous config saved to /var/cache/conftool/dbconfig/20210624-092157-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16721 and previous config saved to /var/cache/conftool/dbconfig/20210624-092105-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16720 and previous config saved to /var/cache/conftool/dbconfig/20210624-092029-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16719 and previous config saved to /var/cache/conftool/dbconfig/20210624-091949-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s2 weights T284897', diff saved to https://phabricator.wikimedia.org/P16718 and previous config saved to /var/cache/conftool/dbconfig/20210624-091753-marostegui.json
  • 09:02 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes: Backport: media: Make the file metadata "_error" check looser (T285431) (duration: 01m 12s)
  • 08:55 legoktm: root@lists1001:/var/log/mailman# rm -rf *
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights T284897', diff saved to https://phabricator.wikimedia.org/P16717 and previous config saved to /var/cache/conftool/dbconfig/20210624-084147-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16716 and previous config saved to /var/cache/conftool/dbconfig/20210624-081409-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16715 and previous config saved to /var/cache/conftool/dbconfig/20210624-081251-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16714 and previous config saved to /var/cache/conftool/dbconfig/20210624-081137-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from s5 api T284897', diff saved to https://phabricator.wikimedia.org/P16713 and previous config saved to /var/cache/conftool/dbconfig/20210624-080945-marostegui.json
  • 08:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 08:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T284897', diff saved to https://phabricator.wikimedia.org/P16712 and previous config saved to /var/cache/conftool/dbconfig/20210624-075613-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T284897', diff saved to https://phabricator.wikimedia.org/P16711 and previous config saved to /var/cache/conftool/dbconfig/20210624-074200-marostegui.json
  • 07:35 eileen: civicrm revision changed from 6d3dd6e5a5 to 9d1203fb28, config revision is 735af27f0d
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T284897', diff saved to https://phabricator.wikimedia.org/P16710 and previous config saved to /var/cache/conftool/dbconfig/20210624-072657-marostegui.json
  • 03:57 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 735af27f0d
  • 03:26 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 1e8e9ac7b9
  • 00:25 eileen: civicrm revision changed from bd906975f0 to 6d3dd6e5a5, config revision is 821e5889f7
  • 00:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:10 eileen: process-control config revision is 821e5889f7
  • 00:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE

2021-06-23

  • 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:42 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:22 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:19 dduvall: rolling back 1.37.0-wmf.11 from group1 (T281152) due to reoccurrence of "PHP Notice: Undefined index: frameCount" now at PNGHandler.php:156 (T285431)
  • 23:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:14 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 23:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 23:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:10 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes
  • 23:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:05 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/GIFHandler.php: Backport: Check for _error in getting metadata array in GIFHandler (T285431) (duration: 01m 06s)
  • 22:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/PNGHandler.php: Backport: Check for _error in getting metadata array in PNGHandler (T285431) (duration: 01m 06s)
  • 22:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 22:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 21:45 sbassett: Deployed updated security patch for T285190 to wmf.9 and wmf.11
  • 20:55 ejegg: updated payments-wiki from 42cfbe832d to d9892207c1
  • 20:38 eileen: civicrm revision changed from 53d103f672 to bd906975f0, config revision is 6a88618c3e
  • 20:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 19:39 dduvall: rolling back wmf.11 from group1 due to increase in logspam possibly related to noted risky patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 (cc T281152 and patch contact Amir1)
  • 19:35 herron: rebooting kafkamon hosts for updates
  • 19:26 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:20 dduvall: preparing to promote wmf.11 group1 (T281152) cc'ing risky patch contacts Amir1, Krinkle, DannyS712
  • 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6e0f5ad: Enable GrowthExperiments donor landing page for testing (T284799) (duration: 01m 05s)
  • 19:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: 2338e53: Revert "Add custom signup flow for donors" (T284740; T284800; T285281) (duration: 01m 06s)
  • 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 38s)
  • 18:55 urbanecm@deploy1002: sync-file aborted: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 01s)
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:54 urbanecm@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 18:53 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 01m 07s)
  • 18:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/WikimediaEvents/extension.json: 01f034b: Finalize WMDEBanner* schema migration to Event Platform (T282562) (duration: 01m 05s)
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 17efbaf: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki (T279886; T285385) (duration: 01m 06s)
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3a2fc6e: Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites (duration: 01m 05s)
  • 18:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try (duration: 09m 11s)
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b4a7867: Make Growth features available to newcomers at lvwiki and skwiki (T278191; T284149) (duration: 01m 06s)
  • 17:58 herron: beginning rolling reboots of kafka-main100[1-5] for updates
  • 17:57 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for NavigationTiming ext streams - T271208, T266798 (duration: 01m 29s)
  • 17:07 herron: beginning rolling reboots of kafka-main200[1-5] for updates
  • 16:42 XioNoX: re-start sending traffic on the codfw-eqsin Telia transport link
  • 15:17 topranks: Removing peering to AS64050 / "BGP Consultancy Pte Ltd" at AMS-IX (cr2-esams). Peer has left IX.
  • 14:54 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s1
  • 14:53 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s8
  • 13:54 effie: rolling restart thanos-fe* to pick up new tegola-vector-tiles account - T283049
  • 13:45 volans: uploaded cumin_4.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:27 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s4
  • 12:59 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s3
  • 12:46 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s7
  • 12:35 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s6
  • 12:26 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s5
  • 12:15 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # T170737
  • 11:46 XioNoX: Simplify labs-in4/6 firewall filters - CR700939
  • 11:10 topranks: Removing peering to AS39651 / "Com Hem AB" at AMS-IX (cr2-esams). Peer has left IX.
  • 10:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:35 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 20s)
  • 09:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 09:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:48 volans: sudo systemctl start ferm.service on thanos-fe2002 (DNS query timeout)
  • 08:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 14s)
  • 08:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 07:57 kart_: cxserver: Removed Matxin MT support and added more language support to Elia MT (T285199, T284900)
  • 07:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:49 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:46 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 07:26 legoktm: uploaded mailman3_3.3.3-1~bpo10+6_amd64.changes on apt1001
  • 07:08 legoktm: updating mailman packages on lists1001 and restarting (T285120, T280889)
  • 06:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo pool`
  • 06:37 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo pool`
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16703 and previous config saved to /var/cache/conftool/dbconfig/20210623-062819-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16702 and previous config saved to /var/cache/conftool/dbconfig/20210623-061316-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16701 and previous config saved to /var/cache/conftool/dbconfig/20210623-055812-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1100', diff saved to https://phabricator.wikimedia.org/P16700 and previous config saved to /var/cache/conftool/dbconfig/20210623-054252-marostegui.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16699 and previous config saved to /var/cache/conftool/dbconfig/20210623-045217-root.json
  • 01:04 eileen: process-control config revision is 6a88618c3e
  • 00:50 eileen: civicrm revision changed from c745d4f075 to 03bead707d, config revision is 4ab72c1033
  • 00:40 legoktm: uploaded new versions of flufl.bounce_4.0-1_amd64.changes hyperkitty_1.3.4-2~bpo10+4_amd64.changes mailman3_3.3.3-1~bpo10+5_amd64.changes mailman-hyperkitty_1.1.0-10~bpo10+1_amd64.changes to apt1001
  • 00:02 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)

2021-06-22

  • 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for search event streams (duration: 01m 05s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7865f27: Add unwatchedpages to rollbacker on frwiki (T285334) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 3/3) (duration: 01m 07s)
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/config/nlwiki.yaml: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 2/3) (duration: 01m 05s)
  • 23:04 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 1/3) (duration: 01m 37s)
  • 22:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=subcats # T170737
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=pages # T170737
  • 22:38 urbanecm: mwscript recountCategories.php --wiki=eowiktionary --mode={pages,subcats,files} (T170737)
  • 21:05 eileen: civicrm revision changed from 629bd3b7b7 to c745d4f075, config revision is 4ab72c1033
  • 21:05 ejegg: updated payments-wiki from 7be0534b91 to 42cfbe832d
  • 20:46 brennen: gitlab1001: running ansible to deploy CAS: stop marking users as external (T274461)
  • 20:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:12 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 20:12 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 19:58 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/699812 (T264231)
  • 19:26 legoktm: set mediawiki-l message acceptance to discard non-member posts instead of reject
  • 19:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.11
  • 19:06 dduvall: preparing to promote wmf.11 group0 (T281152) cc'ing risking patch contacts Amir1, Krinkle, DannyS712
  • 19:01 dduvall@deploy1002: Pruned MediaWiki: 1.37.0-wmf.6 (duration: 03m 35s)
  • 18:46 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs (duration: 04m 23s)
  • 18:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs
  • 18:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:30 awight@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 01m 09s)
  • 18:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:27 awight@deploy1002: sync-file aborted: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 00m 40s)
  • 18:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:19 legoktm: pulled in updates for thirdparty/kubeadm-k8s-1-18 buster-wikimedia on apt1001
  • 17:47 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/700851 (T274463)
  • 17:43 dduvall: testwikis to 1.37.0-wmf.11 (cc open blockers T285125 T285118 T271011)
  • 17:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.11 (duration: 30m 59s)
  • 17:21 moritzm: installing isc-dhcp security updates
  • 17:18 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:14 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:11 moritzm: installing ruby-websocket-extensions security updates
  • 17:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.11
  • 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:07 moritzm: installing velocity security updates
  • 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:04 dduvall: 1.37.0-wmf.11 was branched at c161d3b for T281152
  • 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:41 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 14:57 dcausse@deploy1002: Finished deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74 (duration: 13m 26s)
  • 14:43 dcausse@deploy1002: Started deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74
  • 14:37 XioNoX: start updating analytics firewall rules to capirca generated ones on cr2-eqiad - T279429
  • 14:35 hoo: Updated the Wikidata property suggester with data from the 2021-05-31 JSON dump (with pre-applied T132839 workarounds)
  • 14:01 XioNoX: start updating analytics firewall rules to capirca generated ones on cr1-eqiad - T279429
  • 13:49 kormat: disabling puppet on A:db-all for T285079
  • 13:38 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=nlwiki --phab=T285254 # T285254
  • 13:37 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=nlwiki growthexperiments # T285254
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Correctly enable Vector language switcher treatment A/B test (T269093) (duration: 00m 57s)
  • 13:29 urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # T266913
  • 13:29 Trey314159: reindexing German wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 12:04 Lucas_WMDE: backport+config window done
  • 12:03 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new Vector Languages-in-header feature & AB test for pilot wikis (T269093) (duration: 00m 56s)
  • 11:58 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/: Backport: launchULS: Add context to interface.language.change hook (T280770) (duration: 00m 57s)
  • 11:35 moritzm: installing fluidsynth security updates
  • 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: enwiki: Remove 'collectionsaveascommunitypage' from the 'autoconfirmed' user group (T283523) (duration: 00m 56s)
  • 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16691 and previous config saved to /var/cache/conftool/dbconfig/20210622-110619-kormat.json
  • 10:51 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16690 and previous config saved to /var/cache/conftool/dbconfig/20210622-105115-kormat.json
  • 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16689 and previous config saved to /var/cache/conftool/dbconfig/20210622-103612-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16688 and previous config saved to /var/cache/conftool/dbconfig/20210622-102108-kormat.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16687 and previous config saved to /var/cache/conftool/dbconfig/20210622-094019-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16686 and previous config saved to /var/cache/conftool/dbconfig/20210622-092515-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16685 and previous config saved to /var/cache/conftool/dbconfig/20210622-092056-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16684 and previous config saved to /var/cache/conftool/dbconfig/20210622-091012-root.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16683 and previous config saved to /var/cache/conftool/dbconfig/20210622-090552-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16682 and previous config saved to /var/cache/conftool/dbconfig/20210622-085508-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16681 and previous config saved to /var/cache/conftool/dbconfig/20210622-085049-root.json
  • 08:49 marostegui: Upgrade db1166
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16680 and previous config saved to /var/cache/conftool/dbconfig/20210622-084915-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16679 and previous config saved to /var/cache/conftool/dbconfig/20210622-083545-root.json
  • 07:53 joe: uploaded wmf-certificates package to buster-wikimedia/main, T284417
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 T283499', diff saved to https://phabricator.wikimedia.org/P16678 and previous config saved to /var/cache/conftool/dbconfig/20210622-072828-marostegui.json
  • 06:43 dcausse: repool wdqs1005
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 05:06 marostegui: Stop replication on old s5 master ( db1100) - T284529
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool old master running 10.1 T284529', diff saved to https://phabricator.wikimedia.org/P16677 and previous config saved to /var/cache/conftool/dbconfig/20210622-050602-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 master and set section read-write T284529', diff saved to https://phabricator.wikimedia.org/P16676 and previous config saved to /var/cache/conftool/dbconfig/20210622-050123-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T284529', diff saved to https://phabricator.wikimedia.org/P16675 and previous config saved to /var/cache/conftool/dbconfig/20210622-050036-root.json
  • 05:00 marostegui: Starting s5 eqiad failover from db1100 to db1130 - T284529
  • 04:20 marostegui: Start topology changes for s5 switchover T284529
  • 04:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:11 eileen: process-control config revision is 4ab72c1033
  • 01:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE

2021-06-21

  • 23:16 krinkle@deploy1002: Synchronized wmf-config/mc.php: I13646a5557c9 (duration: 00m 55s)
  • 23:12 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I302a71 (duration: 00m 56s)
  • 23:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Idcac4d (duration: 00m 56s)
  • 23:05 krinkle@deploy1002: Synchronized wmf-config/mc.php: I877a3e (duration: 00m 57s)
  • 23:04 krinkle@deploy1002: Synchronized wmf-config/mc.php: Icc2676 (duration: 00m 56s)
  • 22:57 krinkle@deploy1002: Synchronized wmf-config/mc.php: Iea94283c53 (duration: 00m 57s)
  • 22:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Iea94283c53 (duration: 00m 57s)
  • 22:42 eileen: civicrm revision changed from 0fca489063 to 629bd3b7b7, config revision is 2aed6ff89b
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=viwiki --fix # T284868 # P16674
  • 22:13 eileen: civicrm revision changed from acbcce94a2 to 0fca489063, config revision is 2aed6ff89b
  • 21:11 sbassett: Deployed security patch for T285190
  • 19:19 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 19:19 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 18:41 ppchelko@deploy1002: Synchronized wmf-config/wikitech.php: Replace uses of AbstractBlock::getTarget() T284141 (duration: 00m 58s)
  • 18:30 urbanecm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: af61f1a: Add pool counter for automated search requests (T284479) (duration: 00m 59s)
  • 18:30 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries (duration: 07m 11s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f7db2b9: Enable wikilove on hewikisource (T284864) (duration: 00m 56s)
  • 18:26 brennen: gitlab1001: running ansible for copying latest backup to dedicated folder (T274463)
  • 18:24 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikisource wikilove # T284864
  • 18:23 urbanecm: Correction: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource wikilove # T284864
  • 18:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource # T284864
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries
  • 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dd0fecb: Rename Portal and Portal talk namespaces on viwiki (T284868) (duration: 00m 56s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d8b9df: Disable Education Program namespaces in enwiki (T285193) (duration: 00m 58s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 5a51dd2: Add `managechangetags` to the `abusefilter` group on eswiki (T285167) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 2/2) (duration: 00m 56s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 1/2) (duration: 01m 07s)
  • 17:40 ebernhardson: post-deploy restart airflow-webserver and airflow-scheduler on an-airflow1001
  • 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs (duration: 04m 24s)
  • 17:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 papaul: poweroff elastic2043 for maintenance
  • 15:25 hashar: Updated operations-puppet-tests-buster-docker Jenkins job to use latest Docker image https://gerrit.wikimedia.org/r/c/integration/config/+/700648
  • 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1009.eqiad.wmnet
  • 15:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 15:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
  • 14:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 14:57 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 14:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 14:40 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 14:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 14:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 14:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 14:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:21 volans: deployed spicerack release v0.0.54 on the cumin hosts
  • 14:19 XioNoX: reboot scs-c1-codfw - T285229
  • 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 14:17 XioNoX: reboot scs-a1-codfw - T285229
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1008.eqiad.wmnet
  • 14:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 14:14 klausman: starting update of ML team's etcd machines in eqiad
  • 14:14 volans: uploaded spicerack_0.0.54 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 14:11 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1008.eqiad.wmnet
  • 14:06 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 14:05 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 14:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 13:58 XioNoX: reboot scs-eqsin - T285229
  • 13:58 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 13:57 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1006.eqiad.wmnet
  • 13:56 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 13:55 jynus: stopping replication at db1171:s3 at db1123-bin.004363:906878073
  • 13:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 13:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1006.eqiad.wmnet
  • 13:48 XioNoX: reboot scs-ulsfo
  • 13:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 13:40 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 13:38 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 13:35 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MobileFrontend/includes/ExtMobileFrontend.php: Backport: Avoid loading the whole entity when it only needs description. (T269960) (duration: 00m 58s)
  • 13:28 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 13:24 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 13:19 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 13:17 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 13:14 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:12 elukey: upload istioctl 1.9.5 to {buster,stretch}-wikimedia
  • 13:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:09 klausman: starting update of ML team's etcd machines in codfw
  • 12:55 godog: move librenms alerts with "max alerts" == -1 to "interval" being 15m - T285205
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16672 and previous config saved to /var/cache/conftool/dbconfig/20210621-124030-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16671 and previous config saved to /var/cache/conftool/dbconfig/20210621-123906-root.json
  • 12:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase: Backport: Rewrite SerializationModifier to be more efficient (duration: 01m 02s)
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1010.eqiad.wmnet
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16670 and previous config saved to /var/cache/conftool/dbconfig/20210621-122526-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16669 and previous config saved to /var/cache/conftool/dbconfig/20210621-122403-root.json
  • 12:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1010.eqiad.wmnet
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2008.codfw.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16668 and previous config saved to /var/cache/conftool/dbconfig/20210621-121023-root.json
  • 12:10 godog: bump space for k8s and ops prometheus on prometheus1004 (prometheus1003 has been expanded previously but not logged)
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16667 and previous config saved to /var/cache/conftool/dbconfig/20210621-120859-root.json
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2008.codfw.wmnet
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16665 and previous config saved to /var/cache/conftool/dbconfig/20210621-115519-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T283499', diff saved to https://phabricator.wikimedia.org/P16664 and previous config saved to /var/cache/conftool/dbconfig/20210621-115441-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16663 and previous config saved to /var/cache/conftool/dbconfig/20210621-115355-root.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T283499', diff saved to https://phabricator.wikimedia.org/P16662 and previous config saved to /var/cache/conftool/dbconfig/20210621-115143-marostegui.json
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0bf35e0: Disable indexing user (sub)pages and draft-related pages on hrwiki (T284384) (duration: 00m 56s)
  • 11:21 urbanecm@deploy1002: Synchronized logos/config.yaml: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 57s)
  • 11:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 464cc0b: ptwikinews: Remove NS ID 102,103 (T285163) (duration: 00m 56s)
  • 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add WMCS public addresses to $wgSoftBlockRanges (duration: 00m 56s)
  • 11:04 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 53s)
  • 11:01 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:55 moritzm: restarting FPM on mw canaries to pick up nettle security updates
  • 10:45 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:45 moritzm: installing nettle security updates on buster
  • 10:44 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:44 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:43 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:41 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 50s)
  • 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:40 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:37 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 22s)
  • 10:36 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:36 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 56s)
  • 10:29 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next
  • 10:27 jbond@deploy1002: Finished deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next (duration: 03m 12s)
  • 10:24 jbond@deploy1002: Started deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next
  • 10:22 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 22s)
  • 10:20 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:19 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 13s)
  • 10:17 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:16 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 03s)
  • 10:15 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 30s)
  • 10:13 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 (duration: 03m 10s)
  • 10:10 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/FlaggedRevs: Backport: Drop LocalFile::getHistory hook handler (T284777 T277883) (duration: 00m 58s)
  • 09:52 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable wikisource group as langlink group of sourcewiki (T275958) (duration: 00m 56s)
  • 09:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wmgWikibaseTmpSerializeEmptyListsAsObjects to true everywhere (T241422) (duration: 00m 57s)
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16659 and previous config saved to /var/cache/conftool/dbconfig/20210621-094049-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T284529', diff saved to https://phabricator.wikimedia.org/P16658 and previous config saved to /var/cache/conftool/dbconfig/20210621-092623-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16657 and previous config saved to /var/cache/conftool/dbconfig/20210621-092545-root.json
  • 09:19 ladsgroup@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 04m 49s)
  • 09:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16656 and previous config saved to /var/cache/conftool/dbconfig/20210621-091041-root.json
  • 09:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:56 marostegui: Deploy T266486 T268392 T273360 on db1123
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16655 and previous config saved to /var/cache/conftool/dbconfig/20210621-085538-root.json
  • 08:31 dcausse: depooling wdqs1005 (lag)
  • 07:47 moritzm: updated buster d-i image for Buster 10.10 point release (which included ABI bump for Linux kernel)
  • 07:44 jayme: started debian-weekly-rebuild.service on deneb (it failed due to 404 on snapshots.debian.org yesterday)
  • 06:49 moritzm: installing libwebp security updates on buster
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16654 and previous config saved to /var/cache/conftool/dbconfig/20210621-062156-root.json
  • 06:20 marostegui: Re-add rev_page_id to db1135 T163532 T285149
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T163532', diff saved to https://phabricator.wikimedia.org/P16653 and previous config saved to /var/cache/conftool/dbconfig/20210621-062014-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16652 and previous config saved to /var/cache/conftool/dbconfig/20210621-060652-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16651 and previous config saved to /var/cache/conftool/dbconfig/20210621-055149-root.json
  • 05:50 kart_: cxserver: Added support for Elia MT + Updated to 2021-06-10-074331-production (T276059, T275803, T276246, T283513, T255231, T237028)
  • 05:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16650 and previous config saved to /var/cache/conftool/dbconfig/20210621-053645-root.json
  • 05:33 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:31 kormat: stopping replication on db1123 T283131
  • 05:25 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:11 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1123 until it's reimaged to buster T284648', diff saved to https://phabricator.wikimedia.org/P16649 and previous config saved to /var/cache/conftool/dbconfig/20210621-051149-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 master and set section read-write T284648', diff saved to https://phabricator.wikimedia.org/P16648 and previous config saved to /var/cache/conftool/dbconfig/20210621-050506-kormat.json
  • 05:03 kormat@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T284648', diff saved to https://phabricator.wikimedia.org/P16647 and previous config saved to /var/cache/conftool/dbconfig/20210621-050304-kormat.json
  • 05:02 kormat: Starting s3 eqiad failover from db1123 to db1157 - T284648
  • 04:49 kormat@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 T284648', diff saved to https://phabricator.wikimedia.org/P16646 and previous config saved to /var/cache/conftool/dbconfig/20210621-044955-kormat.json
  • 04:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:40 marostegui: Re-add rev_page_id to db1099:3311 T163532 T285149
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T163532', diff saved to https://phabricator.wikimedia.org/P16645 and previous config saved to /var/cache/conftool/dbconfig/20210621-043941-marostegui.json

2021-06-18

  • 20:55 Krinkle: Remove doc1001:/srv/doc/mediawiki-core/wmf-1.36.0-wmf.31-testing
  • 13:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16640 and previous config saved to /var/cache/conftool/dbconfig/20210618-125306-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16639 and previous config saved to /var/cache/conftool/dbconfig/20210618-123802-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16638 and previous config saved to /var/cache/conftool/dbconfig/20210618-122526-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16637 and previous config saved to /var/cache/conftool/dbconfig/20210618-122259-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16636 and previous config saved to /var/cache/conftool/dbconfig/20210618-121022-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16635 and previous config saved to /var/cache/conftool/dbconfig/20210618-120755-root.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16634 and previous config saved to /var/cache/conftool/dbconfig/20210618-115518-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16633 and previous config saved to /var/cache/conftool/dbconfig/20210618-114015-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16631 and previous config saved to /var/cache/conftool/dbconfig/20210618-112739-marostegui.json
  • 09:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:49 XioNoX: eqsin-codfw link re-enabled but drained
  • 08:39 legoktm: finished adding shellbox LVS entry, https://shellbox.svc.eqiad.wmnet:4008/ and https://shellbox.svc.codfw.wmnet:4008/ now work (T281423)
  • 08:30 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16630 and previous config saved to /var/cache/conftool/dbconfig/20210618-081737-root.json
  • 08:06 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16629 and previous config saved to /var/cache/conftool/dbconfig/20210618-080233-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16628 and previous config saved to /var/cache/conftool/dbconfig/20210618-074729-root.json
  • 07:44 legoktm: restarting pybal on lvs1015, lvs2009 (active) - T281423
  • 07:35 legoktm: restarting pyball on lvs1016, lvs2010 to add shellbox
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16627 and previous config saved to /var/cache/conftool/dbconfig/20210618-073225-root.json
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2010.codfw.wmnet
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2010.codfw.wmnet
  • 06:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16626 and previous config saved to /var/cache/conftool/dbconfig/20210618-063632-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16625 and previous config saved to /var/cache/conftool/dbconfig/20210618-062452-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16624 and previous config saved to /var/cache/conftool/dbconfig/20210618-062129-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16623 and previous config saved to /var/cache/conftool/dbconfig/20210618-060625-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16622 and previous config saved to /var/cache/conftool/dbconfig/20210618-060452-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16621 and previous config saved to /var/cache/conftool/dbconfig/20210618-055122-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16620 and previous config saved to /var/cache/conftool/dbconfig/20210618-054949-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16619 and previous config saved to /var/cache/conftool/dbconfig/20210618-054841-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16618 and previous config saved to /var/cache/conftool/dbconfig/20210618-054659-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16617 and previous config saved to /var/cache/conftool/dbconfig/20210618-053445-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16616 and previous config saved to /var/cache/conftool/dbconfig/20210618-053156-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16615 and previous config saved to /var/cache/conftool/dbconfig/20210618-051942-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16614 and previous config saved to /var/cache/conftool/dbconfig/20210618-051712-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16613 and previous config saved to /var/cache/conftool/dbconfig/20210618-051652-root.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16612 and previous config saved to /var/cache/conftool/dbconfig/20210618-050148-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16611 and previous config saved to /var/cache/conftool/dbconfig/20210618-045808-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16610 and previous config saved to /var/cache/conftool/dbconfig/20210618-045743-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16609 and previous config saved to /var/cache/conftool/dbconfig/20210618-045355-marostegui.json

2021-06-17

  • 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids (T280731)
  • 18:24 ryankemper: T285106 [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
  • 18:01 dancy: Deployed latest scap code to beta cluster
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: client: Bring back using the client setting for langlink group (T284854) (duration: 00m 58s)
  • 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previ