You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Labslogbot
(upgraded most packages on sodium (bblack))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
Line 1: Line 1:
== 2015-07-12 ==
== 2021-08-03 ==
* 14:59 bblack: upgraded most packages on sodium
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2015-07-12 ==
== 2021-08-02 ==
* 14:48 bblack: upgraded apache2 to 2.2.22-1ubuntu1.9 on: antimony argon caesium fluorine helium iodine logstash1001 logstash1003 magnesium neon netmon1001 rhodium stat1001 ytterbium
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2015-07-12 ==
== 2021-07-31 ==
* 04:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 04:49:08 UTC 2015 (duration 49m 7s)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}


== 2015-07-12 ==
== 2021-07-30 ==
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:26:52+00:00
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== 2015-07-12 ==
== 2021-07-29 ==
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 12 02:25:33 UTC 2015 (duration 25m 32s)
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2015-07-12 ==
== 2021-07-28 ==
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 12s)
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2015-07-12 ==
== 2021-07-27 ==
* 02:10 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-12 02:10:00+00:00
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2015-07-12 ==
== 2021-07-26 ==
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 06:39 moritzm: installing krb5 security updates
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki


== 2015-07-11 ==
== 2021-07-24 ==
* 19:48 jynus: stopping labsdb1002 after table corruption has been detected
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php  --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]


== 2015-07-11 ==
== 2021-07-23 ==
* 19:37 urandom: from restbase1002, starting revision culling process (node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | tee >(gzip -c > local_group_wikimedia_T_parsoid_html.log.`date +%s`.gz))
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:15 effie: enable puppet on mc-gp* hosts
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]


== 2015-07-11 ==
== 2021-07-22 ==
* 19:33 urandom: restbase: setting gc_grace_seconds to 604800 (1 week) on local_group_wikipedia_T_parsoid_html.data
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2015-07-11 ==
== 2021-07-21 ==
* 04:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 04:55:56 UTC 2015 (duration 55m 55s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2015-07-11 ==
== 2021-07-20 ==
* 04:21 bd808: Logstash cluster upgrade complete! Kibana working again
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 04:21 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1006
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 04:12 bd808: rebooting logstash1006
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 04:06 bd808: logstash1005 fully recovered all shards
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 03:21 logmsgbot: mattflaschen Synchronized php-1.26wmf13/extensions/Flow/includes/Parsoid/Utils.php: Bump Flow to encode page name when sending to Parsoid (duration: 00m 13s)
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:28:18+00:00
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 02:25 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 06m 07s)
* 17:06 rzl: enabled puppet on A:mw
* 02:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 11 02:25:19 UTC 2015 (duration 25m 18s)
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 02:09 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-11 02:09:45+00:00
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 02:09 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 35s)
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 00:46 bd808: Upgraded Elasticsearch to 1.6.0 on logstash1005; replicas recovering now
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 00:34 bd808: rebooting logstash1005
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 00:30 bd808: logstash1004 fully recovered all shards
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2015-07-10 ==
== 2021-07-19 ==
* 22:51 mutante: tendril: very short maintenance downtime
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 20:10 bd808: `service elasticsearch start` not starting on logstash1004; investigating
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 20:07 bd808: ran apt-get upgrade on logstash1004
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:52 mutante: adminbot - built and imported 1.7.10 into APT repo
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 19:43 bd808: rebooting logstash1004
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 19:40 bd808: Kibana seems to be broken by mixed 1.6.0/1.3.9 cluster
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 19:32 bd808: kibana not seeing indices after upgrading elasticsearch to 1.6.0; investigating
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 19:26 bd808: Upgraded logstash1003 to elasticsearch 1.6.0
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 19:22 bd808: Upgraded logstash1002 to elasticsearch 1.6.0
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 19:19 bd808: Upgraded logstash1001 to elasticsearch 1.6.0
* 18:46 brennen: gerrit1001: restarting gerrit
* 19:10 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableNode.js: https://gerrit.wikimedia.org/r/#/c/224122/ (duration: 00m 12s)
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 18:11 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 120'
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 18:00 gwicke: ansible -i production restbase -a 'nodetool setcompactionthroughput 90'
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 17:49 gwicke: rolling restart of the cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/224114/
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 17:32 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: prevent race condition on writing settings (duration: 00m 13s)
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 17:26 moritzm: installed python security updates on mc*
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 17:25 Coren: rebooting labstore2001 (experiments with the new raid setup caused the mapper table to fill)
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 16:35 mobrovac: restbase deploying hotfix for T105509
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 15:29 mobrovac: restbase restarted restabse on restbase1004
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 15:25 godog: bounce cassandra on restbae1004
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 13:43 godog: bounce cassandra on restbae1004
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 13:37 _joe_: temporarily repooled mw1031
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 12:40 godog: bounce cassandra on restbae1004
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 07:43 godog: reimage ms-be2013 T105213
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 10 04:36:49 UTC 2015 (duration 36m 48s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 04:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037; repool db1030 (revert below) (duration: 00m 12s)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 04:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037; depool db1030 (duration: 00m 13s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 03:14 mutante: re-enabling puppet on tools-exec-1213, working around adminbot package install fail
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 02:59 elee: please log this with the year
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 02:53 andrewbogott: testing the log by logging a test
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 01:50 gwicke: bounced cassandra on restbase1004
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 01:38 jgage: cassandra restarted on restbase1004
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 00:39 urandom: starting restbase1004
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 00:35 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/VisualEditor/modules/ve-mw/ui/inspectors/ve.ui.MWLinkAnnotationInspector.js: https://gerrit.wikimedia.org/r/#/c/223983/ (duration: 00m 12s)
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 00:15 hoo: Updated WikibaseQualityConstraints data on wikidata (wikidatawiki.wbqc_constraints)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== July 9 ==
== 2021-07-16 ==
* 23:41 legoktm: deployed patch for T105413
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:07 gwicke: bounced cassandra on restbase1004
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:02 logmsgbot: catrope Synchronized wmf-config/CommonSettings.php: TitleBlacklist: Don't block account auto-creation (duration: 00m 13s)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:09 logmsgbot: oblivian Synchronized wmf-config/PoolCounterSettings-eqiad.php: I don't think we want to keep poolcounter running on an imagescaler (duration: 00m 12s)
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:30 logmsgbot: tgr Synchronized php-1.26wmf13/extensions/OAuth/api/MWOAuthAPI.setup.php: no canonical redirects for requests with OAuth headers (duration: 00m 12s)
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:05 tgr: backporting https://gerrit.wikimedia.org/r/#/c/223952/- fixes OAuth which is broken for 1.26wmf13
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 20:47 gwicke: temporarily disabled puppet on cassandra nodes while tweaking settings
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 19:53 legoktm: manually fixing global merge of Yuvipanda->YuviPanda (T104686)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 19:04 gwicke: bounced cassandra on restbase1004
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 18:29 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf13
* 15:48 vgutierrez: restart pybal on lvs2010
* 17:54 gwicke: bounced restbase on restbase1005
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:32 ori: installed poolcounter on mw1154
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 17:31 logmsgbot: ori Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 12s)
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 17:22 cmjohnson1: shutting down helium for a few minutes to move within the same row
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 16:53 gwicke: bounced cassandra on restbase1004
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 16:48 godog: reboot ms-be2013 T105213
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 16:38 gwicke: bounced cassandra on restbase1006
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 16:07 _joe_: repooling mw1152
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 15:57 godog: restart cassandra on restbase1002
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:34 gwicke: bounced cassandra on restbase1004
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 15:24 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223739/ (duration: 00m 12s)
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 15:23 logmsgbot: krenair Synchronized php-1.26wmf13/extensions/ContentTranslation: https://gerrit.wikimedia.org/r/#/c/223737/ (duration: 00m 12s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 15:23 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/223742/ (duration: 00m 12s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 15:09 gwicke: bounced cassandra on restbase1004
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 14:44 gwicke: re-enabled compaction throttling (60mb/s) on cassandra nodes
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 14:44 bblack: reprepro: jessie-wikimedia/backports openssl pkg, 1.0.2c-1 => 1.0.2d-1~wmf1
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 14:29 _joe_: reimaging mw1152 for wiping any leftover local hacks. Depooling, scheduling downtime
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:28 moritzm: installed python-django security updates on labmon, netmon and californium
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 14:24 godog: really upgrade python-django on graphite2001
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 13:48 mobrovac: restbase cassandra rolling restart to apply https://gerrit.wikimedia.org/r/223774
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 13:02 godog: upgrade python-django on graphite1001 and graphite2001 following  http://www.ubuntu.com/usn/usn-2671-1/
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 11:34 godog: restart cassandra on restbase1001
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 11:22 logmsgbot: krinkle Synchronized php-1.26wmf13/resources/src/mediawiki/mediawiki.util.js: T105265 (duration: 00m 11s)
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 11:21 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/GlobalFunctions.php: T105265 (duration: 00m 12s)
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 11:09 mobrovac: restbase deploying https://gerrit.wikimedia.org/r/#/c/223297/ which bumps the back-end module version ( https://github.com/wikimedia/restbase-mod-table-cassandra/pull/117 )
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 10:53 mobrovac: restbase started thinner 15 days for wikimedia group
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:37 mark: Shutdown AMS-IX route server BGP sessions on cr1-esams
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 07:48 logmsgbot: oblivian Synchronized php-1.26wmf13/thumb.php: Re-add fix for thumb.php 404s on HHVM (duration: 00m 13s)
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 06:27 twentyafterfour: restarted apache2 on iridium to fix phab exception
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 06:15 springle: db1037 is repartitioning tables; it will lag intermittently for a day
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 06:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 06:05:30 UTC 2015 (duration 5m 29s)
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 05:23 gwicke: dynamically limited cassandra compaction throughput to 80mb/s; please review https://gerrit.wikimedia.org/r/#/c/223722/ to make this permanent
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 03:01 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 03:01:13+00:00
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 02:58 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 05m 29s)
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 02:42 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:42:56+00:00
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 02:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  9 02:40:16 UTC 2015 (duration 40m 15s)
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 02:36 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 32s)
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 02:28 twentyafterfour: restarted phd
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 02:28 twentyafterfour: moved phd log to free disk space on iridium
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 02:24 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-09 02:24:00+00:00
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf13/cache/l10n: (no message) (duration: 00m 34s)
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 02:17 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-09 02:17:02+00:00
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 47s)
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 02:00 springle: pkg upgrade and restart db1037
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 01:49 gwicke: switched remaining cassandra nodes to JDK8
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 01:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 11s)
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 01:07 mutante: uranium - deleted apache logs older than 90 days
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 00:45 RoanKattouw: Running populateContentModel.php --wiki=cawiki --table=revision --ns=5
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 00:20 RoanKattouw: Ran populateContentModel.php --table=revision for odd-numbered namespaces on officewiki for T105245
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)


== July 8 ==
== 2021-07-15 ==
* 23:07 logmsgbot: catrope Synchronized php-1.26wmf13/extensions/Flow: SWAT (duration: 00m 14s)
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:06 bd808: Restarted logstash on logstash1001; no hhvm input seen for last hour
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 22:56 gwicke: finished rolling restart of cassandra cluster to apply https://gerrit.wikimedia.org/r/#/c/223495/
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 22:45 mutante: zirconium - stop puppet for role switch
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 22:33 logmsgbot: legoktm Synchronized php-1.26wmf13/includes/changes/EnhancedChangesList.php: Unbreak missing flags in enhanced RC (duration: 00m 12s)
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 22:08 logmsgbot: hoo Synchronized php-1.26wmf13/extensions/Wikidata/: Update Wikibase: Fix JavaScript ULS usage (duration: 00m 20s)
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 21:51 logmsgbot: manybubbles Synchronized php-1.26wmf12/extensions/CirrusSearch/: Stop some fatals in cirrus (duration: 00m 13s)
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:41 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/api/ApiMain.php: Revert Count API module instantiations and Hook runs (2/2) (duration: 00m 12s)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 21:40 logmsgbot: bd808 Synchronized php-1.26wmf13/includes/Hooks.php: Revert Count API module instantiations and Hook runs (1/2) (duration: 00m 12s)
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 21:39 logmsgbot: bd808 Synchronized php-1.26wmf13/extensions/CirrusSearch/includes/CirrusSearch.php: Suppress interwiki results when they would break (duration: 00m 12s)
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 21:08 bblack: graphite: wiped /var/log/upstart/statsite* logs, restarted statsite processes
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 20:56 csteipp: deployed patches for T103022 & T103023
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 20:53 csteipp: deployed patch for T94116 for wmf12/wmf13
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:30 gwicke: added explicit exit 1 in /etc/init.d/cassandra on restbase1008 to prevent cassandra from starting up there; is puppet restarting it?
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 20:29 subbu: deployed parsoid sha c4cfc527
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:15 gwicke: bounced cassandra on restbase1001
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 20:05:09 UTC 2015 (duration 5m 8s)
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 19:32 gwicke: stopped cassandra on restbase1008
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 19:27 logmsgbot: twentyafterfour Synchronized php-1.26wmf13: deploying UniversalLanguageSelector commit 2e0990ac9879 (duration: 01m 58s)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:26 urandom: restbase rolling restart
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:21 jgage: ran 'kafka preferred-replica-election' to promote analytics1021 back to Leader
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:05 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf13
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:16 moritzm: installed libwmf security updates on various systems
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:09 gwicke: bounced cassandra on restbase1004
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 15:25 mutante: handing over adminship of the "test" mailman list to John F. Lewis (was: Thehelpfulone) due to inactivity
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 13:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db1041 load (duration: 00m 13s)
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:58 paravoid: manually dpkg -P ferm on potassium
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 12:52 paravoid: rmmod all iptables/netfilter-related modules from potassium
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:23 godog: bounce cassandra on restbase1004, heap space
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 11:12 _joe_: mw1153 passed the smoke tests, repooling
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 11:08 godog: bounce cassandra on restbase1004 and restbase1005 'cannot achieve consistency level quorum'
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 10:50 godog: bounce cassandra on restbase1004, death by compaction
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 09:43 ori: _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 09:42 ori: Nuked /var/lib/carbon/whisper/ResourceLoader on graphite[12]001. Data prior to rollout of I55f0c44cd considered bogus.
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 09:42 ori: morebots, are you OK?
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 09:41 godog: bounce nutcracker on silver
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 09:33 _joe_: starting reimaging of mw1153, depooling it and scheduling downtime (at 9:21 UTC)
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 09:26 hashar: upgraded plugins on jenkins and restarting it
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 09:06 hashar: Jenkins registering jobs with Zuul
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 08:41 hashar: Jenkins is migrating old build histories. Lot of disk IO happening
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 08:11 hashar: shutdowning Jenkins for upgrade.
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  8 05:57:10 UTC 2015 (duration 57m 9s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 05:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1041, warm up (duration: 00m 13s)
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 02:31 logmsgbot: LocalisationUpdate completed (1.26wmf13) at 2015-07-08 02:31:24+00:00
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 02:16 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-08 02:16:50+00:00
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 02:16 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 00m 48s)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== July 7 ==
== 2021-07-14 ==
* 23:54 jgage: kafka brokers 1018 & 1021 were demoted; i have triggered a leader election and they are leaders again
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 23:05 logmsgbot: catrope Synchronized visualeditor-default.dblist: Enable VE by default on labswiki (duration: 00m 12s)
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 21:56 hoo: Restarted hhvm on mw1003 "Fatal error: Function already defined: wmfLoadInitialiseSettings in /srv/mediawiki/wmf-config/CommonSettings.php on line 187"
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 21:16 logmsgbot: krinkle Synchronized php-1.26wmf13/includes/resourceloader/ResourceLoader.php: T104769 (duration: 00m 13s)
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:53 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf13
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 20:00 logmsgbot: twentyafterfour Finished scap: testwiki to php-1.26wmf13 and rebuild l10n cache (duration: 39m 41s)
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 19:47 gwicke: restarted cassandra on restbase1005
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 19:20 logmsgbot: twentyafterfour Started scap: testwiki to php-1.26wmf13 and rebuild l10n cache
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 19:15 moritzm: installed PHP security updates on all trusty hosts
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 18:58 ejegg: updated payments from a17ee221db0dbde70c92e24fc188379b6dbad613 to ec34ebf61e5962f66b807abdcb519ff323d41e8e
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:08 twentyafterfour: restarted apache2 on iridium (phab hotfix)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 17:10 robh: OTRS update appears to be functioning normally. As such, ending maintenance window.
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:06 robh: otrs is now using the new sha256 cert
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 17:00 robh: starting otrs maint window
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 16:58 _joe_: restarted HHVM on mw1026, near to OOM
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 16:47 twentyafterfour: applied hotfix for phabricator bug: https://secure.phabricator.com/D13544
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 16:36 mutante: protactinium - manual iptables rules replaced by puppet/ferm rules
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 16:11 logmsgbot: thcipriani Synchronized php-1.26wmf12/extensions/ContentTranslation/extension.json: Remove default value for ContentTranslationCampaigns (duration: 00m 12s)
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 15:33 jynus: manually editing table mediawiki.ipblocks to fully solve a former software bug
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 15:12 Jeff_Green: ptr records for frack/codfw and authdns-update
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 15:10 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable ContentTranslation in enwiki [[gerrit:222991]] (duration: 00m 13s)
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 14:21 jynus: dropping optin_survey_old table from enwiki
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 13:23 akosiaris: restarting gitblit on antimony
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 11:31 mobrovac: restbase restarted cassandra on rb1005
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:26 godog: restart cassandra on restbase1004, heap exhausted
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 10:49 godog: restarted cassandra on restbase1005, mutations through the roof
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 08:27 godog: set operations/puppet/cassandra git submodule repo as hidden
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 06:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul  7 06:11:46 UTC 2015 (duration 11m 45s)
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 05:51 logmsgbot: krinkle Synchronized php-1.26wmf12/extensions/WikiEditor/modules/jquery.wikiEditor.toolbar.js: I3e965dda1c4 (duration: 00m 12s)
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 02:27 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-07 02:27:55+00:00
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 09s)
* 15:37 moritzm: installing klibc security updates
* 01:12 ori: Re-pooled mw1152 at 20:46 UTC, did not log it then.
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 00:41 springle: upgrade db1041 trusty
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 00:37 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/CentralAuth/includes/CreateLocalAccountJob.php: https://gerrit.wikimedia.org/r/#/c/223211/ (duration: 00m 13s)
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 14:51 moritzm: installing apache security updates on puppet masters
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:44 moritzm: installing apache security updates on grafana*
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:13 elukey: restart php-fpm on mw2370
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 12:15 mutante: mw1422 - scap pull
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 00:49 eileen: civicrm revision changed from {{Gerrit|bb62188ec6}} to {{Gerrit|b1c63470bb}}, config revision is {{Gerrit|c291b3c689}}
* 00:48 eileen: process-control config revision is {{Gerrit|c291b3c689}}
* 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)


== July 6 ==
== 2021-07-13 ==
* 23:50 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 12s)
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 08s)
* 23:49 logmsgbot: krenair Synchronized w/static/images/project-logos/mrwikisource.png: https://gerrit.wikimedia.org/r/#/c/221989/ (duration: 00m 13s)
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 07s)
* 23:35 logmsgbot: krenair Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/223179/ - should be labs-only (duration: 00m 12s)
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 23:32 logmsgbot: krenair Synchronized README: https://gerrit.wikimedia.org/r/#/c/222941/ - ... (duration: 00m 13s)
* 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 23:27 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221809/ - should be a noop, just doc changes (duration: 00m 13s)
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 23:25 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/221808/ (duration: 00m 13s)
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 23:17 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/223185/ (duration: 00m 12s)
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 23:06 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220970/ (duration: 00m 14s)
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 21:46 gwicke: restarted cassandra instance on restbase1003; was low on memory and constantly writing small chunks
* 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 21:30 andrewbogott: rebooting labvirt1005, again. Somehow virtualization is turned off again
* 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 21:12 subbu: deployed parsoid version 87a746e6
* 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
* 21:04 logmsgbot: ori Synchronized php-1.26wmf12/thumb.php: cdc75debaf: Add Content-Length header to thumb.php error responses (duration: 00m 13s)
* 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 21:02 mutante: purging static-bz URL on varnish ...
* 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: [[gerrit:704368{{!}}links is flat array (T286040)]] (duration: 02m 07s)
* 20:39 akosiaris: upload php5_5.3.10-1ubuntu3.19-wmf1 on apt.wikimedia.org/precise-wikimedia
* 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
* 20:15 gwicke: restart cassandra instance on 1005
* 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
* 20:04 mobrovac: restbase restart cassandra on rb1005
* 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
* 19:28 logmsgbot: krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/223040/ (duration: 00m 12s)
* 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 19:11 gwicke: reduced compaction throughput from 160 to 100 mb/s across the cassandra cluster via 'nodetool -h <host> setcompactionthroughput 100'
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
* 18:51 gwicke: restarted cassandra on restbase1001 with jdk8, see T104888
* 17:45 mutante: mw1283 - decom - powered off by cookbook
* 18:22 gwicke: restarted cassandra on restbase1004 with jdk8
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
* 17:54 Jeff_Green: authdns-update for new rigel A record
* 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - [[phab:T280203|T280203]]"
* 17:42 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: increase db2029 traffic to normal levels (duration: 00m 12s)
* 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:37 gwicke: upgraded restbase1005 to jdk8
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:35 gwicke: restarting cassandra instance on restbase1005: out of heap
* 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:10 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade(2/2) (duration: 00m 11s)
* 17:09 mutante: mw1282 - decom, powered off
* 17:09 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029 again after conf upgrade (duration: 00m 11s)
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 16:38 jynus: upgrade and restart of db2029
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
* 16:35 ori: depooled mw1152
* 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181{{!}}Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)
* 15:29 logmsgbot: krenair Finished scap: https://gerrit.wikimedia.org/r/#/c/222993/ (duration: 22m 09s)
* 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 15:21 _joe_: repooling mw1152
* 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 15:20 _joe_: attempting dump-apc on mw1060
* 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 15:09 _joe_: depooled the HHVM imagescaler again
* 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 15:07 logmsgbot: krenair Started scap: https://gerrit.wikimedia.org/r/#/c/222993/
* 16:55 jbond: upload statograph to buster wikimedia
* 15:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222617/ (duration: 00m 12s)
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
* 14:48 moritzm: installed python security updates on analytics*, lab* and virt*
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 14:46 moritzm: added python-diskimage-builder 0.1.46-1+wmf1 for jessie-wikimedia on carbon
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 14:43 _joe_: depooled the HHVM imagescaler, spitting 503s again.
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 14:18 mobrovac: restbase started thinning out parsoid data (local_group_wikipedia_T_parsoid_dataDVIsgzJSne8k) for >= 22 days
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 14:07 YuviPanda: restart apache on labcontrol1001 to pick up parser function change
* 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 12:57 moritzm: installed python security updates on mw*, es* and db*
* 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
* 12:18 logmsgbot: hoo Synchronized wmf-config/: Enable WikibaseQuality and WikibaseQualityConstraints on wikidata (duration: 00m 13s)
* 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:15 logmsgbot: hoo Finished scap: Update WikibaseQuality and WikibaseQualityConstraint (duration: 25m 56s)
* 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 11:49 logmsgbot: hoo Started scap: Update WikibaseQuality and WikibaseQualityConstraint
* 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
* 11:40 hoo: Created the `wbqc_constraints` table on wikidatawiki
* 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483)
* 09:02 _joe_: restarted the appserver on mw1059 with hhvm.server.apc.expire_on_sets = true, restarted the heap profiling to confirm my hypothesis on T104769
* 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 08:31 _joe_: restarted cassandra on rb1004. again.
* 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 05:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1034, depool db1041 (duration: 00m 12s)
* 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]] (duration: 03m 28s)
* 05:00 springle: stash/pull/apply CommonSettings.php on tin, which was left with modifications
* 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]]
* 04:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul  6 04:35:45 UTC 2015 (duration 35m 44s)
* 13:37 effie: rolling restart php-fpm across clusters - [[phab:T286260|T286260]]
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-06 02:22:12+00:00
* 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176{{!}}Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 06m 07s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:14 kormat: restarted replication on db1117:3325 [[phab:T284622|T284622]]
* 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
* 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 12:53 kormat: stopping replication on db1117:3325 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - [[phab:T280203|T280203]]
* 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 12:20 mutante: mwmaint1002 - scap pull after reimaging
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:28 Lucas_WMDE: EU backport+config window done
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:704304{{!}}Remove obsolete $wgShowDBErrorBacktrace config]] (duration: 01m 25s)
* 11:13 mutante: mwmaint1002 - reimaging with buster ([[phab:T267607|T267607]])
* 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed ([[phab:T267607|T267607]])
* 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan: running `nodetool decommission` on maps2008
* 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:18 moritzm: installing apache security updates on Logstash hosts
* 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
* 09:40 moritzm: installing apache security updates on thanos-fe hosts
* 09:38 moritzm: installing apache security updates on parsoid hosts
* 09:31 effie: depool mw2383 [[phab:T286463|T286463]]
* 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:45 effie: depool mw2383 - [[phab:T286463|T286463]]
* 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
* 07:06 moritzm: installing apache security updates on codfw mw* hosts
* 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - [[phab:T273026|T273026]]
* 06:06 effie: pool mw2383  - [[phab:T286463|T286463]]
* 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
* 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
* 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
* 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`


== July 5 ==
== 2021-07-12 ==
* 22:30 bd808: Restarted logstash on logstah1001; Hung due to OOM errors
* 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1896efc27f3de39659673091bc4c43ad874da0c5}}: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T286163|T286163]]) (duration: 00m 56s)
* 22:03 mobrovac: restbase rolling restart of restbase
* 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=[[phab:T286396|T286396]] # [[phab:T286396|T286396]]
* 18:11 logmsgbot: krenair Synchronized docroot/noc: https://gerrit.wikimedia.org/r/#/c/222932/ (duration: 00m 12s)
* 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 17:49 logmsgbot: krenair Synchronized docroot/noc/conf: https://gerrit.wikimedia.org/r/#/c/222290/ (duration: 00m 13s)
* 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php ([[phab:T286396|T286396]])
* 17:44 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221600/ (duration: 00m 12s)
* 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 15:16 YuviPanda: restarted nutcracker on silver.
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|284216a7d35c815ea203a9c0bd738a1e1bf31f7e}}: Add few namespace aliases for Serbian Wikipedia ([[phab:T286396|T286396]]) (duration: 00m 56s)
* 12:55 mobrovac: restbase rolling restart of cassandra to apply the 16G heap change https://gerrit.wikimedia.org/r/222899
* 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8a79bf752ff5eb15f3042fd94ba10c2c50607a85}}: enwiki: Delete Book namespace ([[phab:T285766|T285766]]) (duration: 00m 57s)
* 11:21 _joe_: restarted cassandra on restbase1004 (again), seemingly crashed for a bad request
* 23:29 urbanecm@deploy1002: Synchronized static/images/: {{Gerrit|d007b9ccb77db9f3dc492df7a35477e5563a921a}}: Remove unused celebration logos and wordmark ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:03 _joe_: restarting cassandra on rb1003,4 and restbase on rb1002,3
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c581493fbe5d9c372fd44635b704d04040d8b38}}: Add editautoreviewprotected to bot on hewikisource ([[phab:T275076|T275076]]) (duration: 00m 57s)
* 09:43 bblack: restarted restbase on restbase1005
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40eade4131eac95ba3dc0d918ad540070d7bcb99}}: Enable RelatedArticles Extension in zhwikinews ([[phab:T266933|T266933]]) (duration: 00m 57s)
* 08:40 _joe_: collecting heaps on an api appserver, mw1115, as comparison
* 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # [[phab:T286101|T286101]], P16817
* 08:29 _joe_: restaarted HHVM on mw1059 with heap profiling enabled, collecting data (will stop this evening).
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ab00d188bc4161e40455b842f613698548b3518}}: zhwiktionary: Add templateeditor right ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 08:27 bblack: FYI: 08:15 < grrrit-wm> (CR) BBlack: [C: 2 V: 2] filter S:RI from wm2015register T45250 [puppet] - https://gerrit.wikimedia.org/r/222879 (owner: BBlack)
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5822b2be129b934939af46bab5b8916039661e97}}: zhwiktionary: Add aliases for namespaces ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 08:23 _joe_: restarted hhvm because of ooms, not apache
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba0967f5c18652d02b7b476e9592b81dcb9b74fc}}: zhwiktionary: Add Reconstruction namespace ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 08:23 _joe_: restarted apache on mw1105,mw1092,90,82,78
* 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
* 07:09 bblack: restarted cassandra on restbase1004
* 21:26 urbanecm: Start server-side upload for 2 video files ([[phab:T286432|T286432]], [[phab:T286433|T286433]])
* 07:07 bblack: restarted cassandra + restbase on restbase1005
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]] (duration: 03m 39s)
* 07:01 jynus: Restarted HHVM for mw1112,1028,1057,1061,1069,1070,1084,1086
* 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job  - [[phab:T271232|T271232]]
* 02:57 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-05 02:57:28+00:00
* 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki ([[phab:T257066|T257066]]) (duration: 00m 58s)
* 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]] (duration: 21m 24s)
* 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]]
* 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
* 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
* 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
* 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
* 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]] (duration: 03m 30s)
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]]
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]] (duration: 03m 16s)
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]]
* 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]] (duration: 03m 37s)
* 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]]
* 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - [[phab:T282484|T282484]]
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
* 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:703567{{!}}Enable template search improvements on first wikis 2/2 (T284553)]] (duration: 00m 57s)
* 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703566{{!}}Enable template search improvements on first wikis 1/2 (T284553)]] (duration: 00m 56s)
* 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: [[gerrit:703649{{!}}Always add 1 prefixsearch match when searching for templates]] (duration: 00m 57s)
* 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
* 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
* 11:40 moritzm: installing apache updates on mw1/eqiad hosts
* 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|773c956811cba5c3a2cbba32bc1e1a536dbd9f0b}}: Revert "Use ptwiki 20th anniversary logos" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
* 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd5f5375b4f712c56e9396cc550078272ef668de}}: Revert "ptwiki: Use celebration logos in new vector" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702761{{!}}Add 'editautoreviewprotected' protection level to hewikisource (T275076)]] (duration: 00m 57s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
* 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703568{{!}}Enable transclusion back button on first wikis (T284553)]] (duration: 00m 58s)
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
* 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
* 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
* 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for [[phab:T285927|T285927]]
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - [[phab:T285251|T285251]]
* 10:03 effie: depool mw2383
* 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 10:01 godog: test thanos-compact upload with smaller part size - [[phab:T285835|T285835]]
* 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
* 09:07 godog: repool thanos-fe2002 - [[phab:T285835|T285835]]
* 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - [[phab:T285835|T285835]]
* 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: [[gerrit:703890{{!}}Remove subscribing to other aspect for entity usage (T286193)]] (duration: 00m 59s)
* 07:44 jynus: restart db1102:x1 mariadb instance
* 07:01 moritzm: installing apache2 security updates
* 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish ([[phab:T275268|T275268]])
* 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951{{!}}Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)
* 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: [[gerrit:703891{{!}}Add --sleep option to refreshImageMetadata.php]] (duration: 01m 04s)
* 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force ([[phab:T275268|T275268]])
* 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950{{!}}Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)


== July 4 ==
== 2021-07-09 ==
* 23:49 Krenair: Ran "mwscript updateSpecialPages.php labswiki --override --only=Wantedpages" on silver, completed in 0.44 seconds
* 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:44 Krenair: test morebots
* 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:22 YuviPanda: restarted cassandra on restbase1004 per urandom
* 22:36 legoktm: running benchmarking scripts again shellbox
* 19:15 YuviPanda: restarted cassandra on restbase1001
* 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 08s)
* 17:15 _joe_: restarted cassandra on restbase1001
* 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]]
* 16:12 logmsgbot: krenair Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 10m 35s)
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
* 12:56 logmsgbot: krinkle Synchronized php-1.26wmf12/resources/src/mediawiki/mediawiki.Title.js: I1dae1e63e47 (duration: 00m 17s)
* 11:40 _joe_: deleting coredns pod in codfw, potentially causing [[phab:T286360|T286360]]
* 05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul  4 05:01:43 UTC 2015 (duration 1m 42s)
* 10:13 _joe_: recreated all pods for zotero in codfw
* 03:11 ori: Promoted Krinkle and Krenair to admin, cloudadmin on wikitech, because duh.
* 00:47 legoktm: zotero rolling restart didn't help, filed [[phab:T286360|T286360]] for DNS issues
* 02:39 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-04 02:39:41+00:00
* 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues
* 02:29 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 09m 59s)
* 01:00 springle: reload haproxy dbproxy1004


== July 3 ==
== 2021-07-08 ==
* 23:59 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/Translate/: Translate+UserMerge fixes (duration: 00m 17s)
* 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - [[phab:T281423|T281423]] (duration: 00m 57s)
* 23:55 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/WikiLove/: WikiLove+UserMerge fixes (duration: 00m 18s)
* 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - [[phab:T281423|T281423]] (duration: 00m 58s)
* 23:24 logmsgbot: ori Synchronized w/404.php: Force 'Transfer-Encoding: Chunked' header on 404 responses (duration: 00m 31s)
* 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
* 22:36 Krenair: restarted apache on silver to see if it would make https://gerrit.wikimedia.org/r/#/c/221969/ take effect for T104360. It did not.
* 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
* 21:46 ori: depooled mw1152
* 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 20:12 ori: restarted cassandra on restbase1001
* 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:28 ori: pooled mw1152 (HHVM image scaler) for debugging.
* 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:05 logmsgbot: krenair Synchronized php-1.26wmf12/extensions/Collection/RenderingAPI.php: https://gerrit.wikimedia.org/r/#/c/222616/ - hoping this fixes T104708 (duration: 00m 44s)
* 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
* 15:35 YuviPanda: cd /mnt/backup/others-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -c chacha20-poly1305@openssh.com -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-others-20150703" on labstore1002
* 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
* 15:35 YuviPanda: mount /dev/mapper/backup-others--20150703 /srv/backup-others-20150703/ on labstore2001
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
* 15:34 YuviPanda: mkdir /srv/backup-others-20150703 on labstore2001
* 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
* 15:33 YuviPanda: mkfs -t ext4 /dev/mapper/backup-others--20150703 on labstore2001 completed
* 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 06s)
* 15:33 YuviPanda: run mount -o ro /dev/mapper/labstore-others--20150703 /mnt/backup/others-20150703/ on labstore1002
* 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]]
* 15:32 YuviPanda: run mkdir /mnt/backup/others-20150703 on labstore1002
* 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]] (duration: 05m 27s)
* 15:31 YuviPanda: run  lvcreate -L 640G -s -n others-20150703 labstore/others on labstore1002
* 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]]
* 15:29 YuviPanda: running mkfs -t ext4 /dev/mapper/backup-others--20150703 on labstore2001
* 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]] (duration: 05m 42s)
* 15:28 YuviPanda: run lvcreate -L 3.5T -n others-20150703 backup on labstore2001
* 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]]
* 15:25 YuviPanda: begin process of backing up others (all labs projects except tools) on to labstore2001 from labstore1002
* 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - [[phab:T271232|T271232]] [[phab:T273901|T273901]] (duration: 01m 09s)
* 14:06 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1022 (low traffic) (duration: 00m 54s)
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
* 13:27 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2047 after maintenance (duration: 00m 22s)
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
* 13:27 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -c chacha20-poly1305@openssh.com -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
* 13:27 YuviPanda: interrupting tar |ssh | tar script and cleaning out destination again
* 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]] (duration: 03m 22s)
* 13:17 YuviPanda: clean out tar | ssh | tar target on labstore2001
* 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]]
* 13:15 YuviPanda: /dev/null filled up on labstore1002, aborting pipe of valuable user data into it.
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
* 13:13 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T > /dev/null on labstore1002
* 12:52 moritzm: installing klibc security updates on buster
* 13:02 YuviPanda: run cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -C -p -r -e -b -t -B 32M -T | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -C -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 12:38 moritzm: installing openexr security updates
* 13:02 YuviPanda: interrupt tar | ssh | tar on labstore1002 and killed dest on labstore2001
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
* 12:43 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -p -r -e -b -t -B 32M -T | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -B 32M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on screen on labstore1002
* 10:20 jbond: upgrade golang-cfssl
* 12:43 mobrovac: restbase deploying restbase/deploy @ 1a826a5
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
* 12:42 YuviPanda: interrupt tar | ssh | tar on labstore1002, clean out destination on labstore2001
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
* 12:36 YuviPanda: interrupted tar | ssh | tar on labstore1002 and cleaned out dest on labstore2001
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
* 12:35 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | pv -L 80M -p -r -e -b -t -B 16M | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -B 16M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" in screen on labstore1002
* 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
* 12:33 YuviPanda: rm -rf /srv/backup-tools-20150703/* on labstore2001
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
* 12:31 mark: labstore2001: mount /srv/backup -o remount,ro
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
* 12:31 YuviPanda: interrupt tar | ssh | tar on labstore1002
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
* 12:29 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs -cpf - . | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -L 80M -p -r -e -b -t -B 16M | tar --acls --xattrs -xpf - -C /srv/backup-tools-20150703" on labstore1002
* 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:28 YuviPanda: cd /mnt/backup/tools-20150703/ ; tar --acls --xattrs cpf - . | ssh -i ~/.ssh/id_labstore root@labstore2001.codfw.wmnet "pv -L 80M -p -r -e -b -t -B 16M | tar --acls --xattrs xpf - -C /srv/backup-tools-20150703" on labstore1002
* 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:09 YuviPanda: running mount -o ro /dev/mapper/labstore-tools--20150703 /mnt/backup/tools-20150703/ now
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
* 11:57 YuviPanda: run  lvcreate -L 640G -s -n tools-20150703 labstore/tools on labstore1002
* 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 YuviPanda: running  lvcreate -L 640G -s tools -n tools-20150703 labstore on labstore1002
* 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 [[phab:T284811|T284811]]
* 11:26 YuviPanda: umount /mnt/backup/project/tools/ on labstore1002
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
* 11:24 YuviPanda: ran mount /dev/mapper/backup-tools--20150703 /srv/backup-tools-20150703/ on labstore2001
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
* 11:22 YuviPanda: mkdir /srv/backup-tools-20150703 on labstore2001
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
* 11:13 YuviPanda: run mkfs -t ext4 /dev/mapper/backup-tools--20150703  on labstore2001
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
* 11:09 YuviPanda: lvcreate -L 6TB -n tools-20150703 backup on labstore2001
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
* 11:09 jynus: reimports finished on dbstore2* hosts and puppet reenabled after T104471 was fixed
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
* 10:56 mobrovac: restbase disabling puppet on restbase1005 to tweak JVM params for cassandra
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
* 10:50 YuviPanda: started du of maps project on labstore2001
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json
* 09:36 mobrovac: restbase restarting cassandra on rb1002
* 06:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul  3 06:19:02 UTC 2015 (duration 19m 1s)
* 02:50 urandom: restbase rolling restart
* 02:49 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-03 02:49:31+00:00
* 02:42 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 11m 43s)
* 02:06 logmsgbot: ori Synchronized php-1.26wmf12/extensions/CentralAuth: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/CentralAuth  7f8da7139714dd5089dd03e8679aba25c2c89c4d (duration: 00m 15s)


== July 2 ==
== 2021-07-07 ==
* 22:34 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/CentralAuth/: Made use of new USE_MULTI_COMMIT flag in user merge jobs (duration: 00m 18s)
* 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
* 22:31 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/UserMerge/:  Added USE_MULTI_COMMIT flag to enable query batching (duration: 00m 26s)
* 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (2/2) (duration: 00m 59s)
* 21:51 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/Interwiki/Interwiki_body.php: Add missing global $wgInterwikiViewOnly declaration (duration: 00m 15s)
* 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to <nowiki>{</nowiki>Production,Labs<nowiki>}</nowiki>Services.php (1/2) (duration: 00m 59s)
* 21:37 twentyafterfour: restarted apache2 or iridium after applying hotfix for phabricator css issue
* 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 05m 28s)
* 21:22 logmsgbot: legoktm Synchronized php-1.26wmf12/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/222484 (duration: 00m 15s)
* 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
* 21:16 cwdent: updated civicrm from 4fe0648ea9f36282731bf651a59ca1a617db6c08 to 04efc7d5c7bbb068f907125f2184692aee676123
* 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 20:47 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Disable global merge (duration: 00m 14s)
* 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]] (duration: 17m 22s)
* 20:13 andrewbogott: restarted keystone on labcontrol1001
* 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - [[phab:T271232|T271232]]
* 18:54 bd808: Running sync-common on mw1111; fatal log showed it to be running 1.26wmf9
* 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
* 18:30 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf12
* 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
* 18:02 YuviPanda: running exportfs -ra on labstore1002
* 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:40 bd808: Restarted logstash on logstash1001 due to OOM
* 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
* 16:05 bblack: cp1065 undowntimed/repooled
* 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
* 16:04 YuviPanda: clean out exports.d in labstore1002, will get regenerated. backup in /root/exports.backup
* 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 logmsgbot: anomie Synchronized php-1.26wmf12/extensions/Wikidata/: SWAT: Update Wikibase: SearchEntities return 'aliases' when not same as label [[gerrit:222311]] (duration: 00m 20s)
* 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:18 YuviPanda: killed icinga-wm again
* 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:17 bblack: depooled cp1065 in pybal/puppet
* 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:57 mutante: restarting gitblit on antimony for the 123443th time
* 14:49 moritzm: installing djvulibre security updates
* 14:54 mutante: restarted apache on strontium
* 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
* 14:50 YuviPanda: killed icinga-wm for a bit
* 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]] (duration: 05m 41s)
* 14:43 YuviPanda: kicked puppetmaster on palladium
* 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - [[phab:T271232|T271232]]
* 14:28 YuviPanda: restarted apache on labcontrol1001
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:14 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool db2029 again: T104573 (duration: 00m 12s)
* 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:58 urandom: restarted restbase1005.eqiad
* 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]] (duration: 03m 11s)
* 13:49 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool db2029; depool db2047 for maintenance (duration: 00m 13s)
* 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - [[phab:T271232|T271232]]
* 11:19 mobrovac: restbase restarting cassandra on rb1005
* 12:12 urbanecm: Start server-side upload for 3 video files ([[phab:T286173|T286173]], [[phab:T286175|T286175]], [[phab:T286174|T286174]])
* 07:06 logmsgbot: krinkle Synchronized w/touch.php: T104538 (duration: 00m 11s)
* 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
* 07:05 logmsgbot: krinkle Synchronized w/favicon.php: T104538 (duration: 00m 11s)
* 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
* 06:34 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Emergency depool of db2029 (duration: 00m 12s)
* 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
* 06:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul  2 06:27:57 UTC 2015 (duration 27m 56s)
* 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
* 04:18 ori: depooled mw1152.
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
* 03:38 logmsgbot: krinkle Synchronized docroot/default/index.html: 6d49d229806 (duration: 00m 12s)
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
* 03:37 logmsgbot: krinkle Synchronized 404.html: 6d49d229806 (duration: 00m 12s)
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
* 03:14 logmsgbot: legoktm Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 12s)
* 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
* 02:54 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-02 02:54:06+00:00
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
* 02:52 logmsgbot: krinkle Synchronized docroot and w: 245a1ff (duration: 00m 12s)
* 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
* 02:51 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 05m 19s)
* 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-07-02 02:37:03+00:00
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 23s)
* 00:44 ori: Repooling mw1152 (HHVM image scaler) for testing)


== July 1 ==
== 2021-07-06 ==
* 23:30 springle: restart mysqld dbstore2002 T104471
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 23:06 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/222202/ (duration: 00m 11s)
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:39 godog: bounce gitblit
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:38 jgage: restarted gitblit on antimony
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:50 ori: restarted gitblit on antimony
* 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
* 19:49 ori: mw1152 not actually re-pooled because of ongoing work on palladium. I'm undoing the change and hanging back now.
* 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
* 19:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf12
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
* 19:36 logmsgbot: twentyafterfour Synchronized php-1.26wmf12: sync 1.26wmf12 branch revert of "Implement support for Google reCAPTCHA 2.0" 90665a737bc25ff3c859044755d662c6cd700573 (duration: 02m 04s)
* 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
* 19:31 jynus: replication issues for shard s7 on dbstore2001 and dbstore2002, production applications *not* affected
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
* 19:31 urandom: from restbase1002; node thin_out_key_rev_value_data.js `hostname -i` local_group_wikipedia_T_parsoid_html 2>&1 | pv --line-mode | gzip -c > wikipedia_T_parsoid_html.log.gz
* 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
* 19:28 ori: Repooling mw1152 for further testing of HHVM scaler
* 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
* 19:03 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Update DataModel to fix SnakList (duration: 00m 20s)
* 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
* 18:42 logmsgbot: hoo Synchronized wmf-config/mobile-labs.php: consistency (duration: 00m 12s)
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
* 18:41 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings-labs.php: consistency (duration: 00m 31s)
* 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 18:02 andrewbogott: restarted keystone on labcontrol1001
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 17:03 jgage: beginning puppet CA replacement procedure
* 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 16:06 ejegg: enabled queue consumers
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 16:05 akosiaris: re-enabling ntp everywhere
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
* 15:59 ejegg: disabled queue consumers
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
* 15:30 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Remove alias uniqueness constraints (duration: 00m 21s)
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
* 15:06 urandom: restbase1002: PWD=/home/eevans/restbase-mod-table-cassandra/maintenance; node thin_out_key_rev_value_data.js `hostname -i` local_group_wikimedia_T_parsoid_html 2>&1 | pv --line-mode | gzip -c > wikimedia_T_parsoid_html.log.gz
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
* 15:05 bblack: re-enabling puppet on caches
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
* 14:59 bblack: disabling puppet on caches (because puppet always breaks when you move files/modules around...)
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
* 13:57 bblack: rebooting cp2001 (test kernel update)
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
* 11:32 YuviPanda: rsync on labstore1002 finished, restarting to see what was skipped + errors
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
* 10:47 moritzm: installed patch security updates on 862 hosts
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
* 10:42 hashar: restarting Jenkins: upgrading Jenkins gearman plugin from 0.1.1-8-gf2024bd to 0.1.1-9-g08e9c42-change_192429_2  https://phabricator.wikimedia.org/T72597#1416913
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
* 07:48 mobrovac: restbase restarting cassandra on rb1005
* 10:19 moritzm: installing jackson-databind security updates on buster
* 05:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul  1 05:28:38 UTC 2015 (duration 28m 37s)
* 09:01 _joe_: repooling wdqs1007 now that lag has caught up
* 05:27 csteipp: deployed patch for T103765
* 08:43 moritzm: installing libuv1 security updates on buster
* 04:41 logmsgbot: krinkle Synchronized php-1.26wmf12/includes/resourceloader/ResourceLoader.php: Iee884208c5c4b minify cache key (duration: 00m 11s)
* 07:06 marostegui: Upgrade db1104 kernel
* 03:10 mutante: git pull on strontium
* 06:54 moritzm: installing PHP 7.3 securiy updates on buster
* 03:00 logmsgbot: LocalisationUpdate completed (1.26wmf12) at 2015-07-01 03:00:21+00:00
* 06:50 marostegui: Upgrade db1122 kernel
* 02:53 logmsgbot: l10nupdate Synchronized php-1.26wmf12/cache/l10n: (no message) (duration: 10m 12s)
* 06:35 marostegui: Upgrade db1138 kernel
* 02:26 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-07-01 02:26:55+00:00
* 06:31 marostegui: Upgrade db1160 kernel
* 02:23 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 06m 50s)
* 00:56 eileen: process-control config revision is {{Gerrit|8d46b52ed4}}
* 02:12 springle: upgrade db1034 trusty
* 01:37 ori: Depooled mw1152. Req error dashboard shows elevated 5xx rates correlating with the server getting pooled, but the logs don't appear to corroborate it. Odd.
* 01:03 ori: Disabling Puppet on mw1152 for 12h to hack apache config to log locally
* 00:42 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I9a8018981: Double $wgMaxShellMemory on HHVM scalers (512 Mb => 1024 Mb) (duration: 00m 12s)
* 00:34 ori: pooled mw1152 (HHVM rendering) at weight 10 for testing
* 00:33 gwicke: rolling cassandra restart done
* 00:23 gwicke: starting rolling restart of cassandra nodes to apply new config
* 00:01 greg-g: we're still here


== June 30 ==
== 2021-07-05 ==
* 23:30 logmsgbot: hoo Synchronized php-1.26wmf12/extensions/Wikidata/: Fix EntityParserOutputGenerator (duration: 00m 21s)
* 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image ([[phab:T286212|T286212]])
* 22:55 ori: depooled mw1152
* 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
* 22:52 ori: Pooled HHVM image scaler (mw1152) at weight 1 for testing.
* 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
* 22:52 gwicke: updated restbase1004 to openjdk-8
* 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 [[phab:T286042|T286042]]
* 22:46 bblack: restarting gitblit on antimony, because Java is so 1996
* 11:29 moritzm: installing openexr security updates on stretch
* 22:43 tgr: running eval.php (along the lines of https://gerrit.wikimedia.org/r/#/c/221783) on commonswiki to fix T104395
* 11:07 moritzm: installing tiff security updates on stretch
* 22:13 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Flow-occupy Wikipedia talk namespace on cawiki (duration: 00m 11s)
* 10:48 moritzm: upgrading PHP on miscweb*
* 22:09 matt_flaschen: Done converting wikitext namespace to Flow on Catalan Wikipedia
* 10:37 jbond: enable puppet  fleet wide to post puppetdb change
* 22:03 matt_flaschen: Started convertNamespaceFromWikitext.php for Project_talk on Catalan Wikipedia
* 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication [[phab:T286102|T286102]]
* 21:46 RoanKattouw: Also ran populateContentModel.php --table=archive for talk namespaces on officewiki
* 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
* 21:45 RoanKattouw: Ran populateContentModel.php --table=archive --ns=5 on officewiki
* 08:15 moritzm: rolling out debmonitor-client 0.3.0
* 21:29 RoanKattouw: Ran populateContentModel.php --table=page --ns=5 on cawiki
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 21:19 logmsgbot: catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s)
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 21:19 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 14s)
* 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 21:14 logmsgbot: catrope Synchronized php-1.26wmf12/extensions/Flow: (no message) (duration: 00m 14s)
* 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 21:14 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: (no message) (duration: 00m 13s)
* 07:04 _joe_: restarting blazegraph, then restarting the updater again
* 21:01 RoanKattouw: Running populateContentModel.php on officewiki for page table in namespaces occupied by Flow (1,3,5,7,9,11,13,15,91,93,101,111,113,829)
* 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
* 20:58 logmsgbot: catrope Synchronized php-1.26wmf12/maintenance/: Add populateContentModel maintenance script (duration: 00m 13s)
* 06:47 _joe_: restart wdqs-updater on wdqs1007
* 20:58 logmsgbot: catrope Synchronized php-1.26wmf11/maintenance/: Add populateContentModel maintenance script (duration: 00m 17s)
* 00:53 eileen: process-control config revision is {{Gerrit|a1717c7fde}}
* 20:53 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Log 'wbq_evaluation' (duration: 00m 12s)
* 00:47 eileen: process-control config revision is {{Gerrit|24565578f7}}
* 20:46 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Enable WikibaseQuality extensions on testwikidata (duration: 00m 14s)
* 20:39 hoo: Created `wbqc_constraints` on testwikidatawiki (s3).
* 20:23 logmsgbot: thcipriani rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.26wmf12
* 20:15 logmsgbot: thcipriani Purged l10n cache for 1.26wmf6
* 20:14 logmsgbot: thcipriani Purged l10n cache for 1.26wmf7
* 20:14 logmsgbot: thcipriani Purged l10n cache for 1.26wmf8
* 20:13 logmsgbot: thcipriani Purged l10n cache for 1.26wmf9
* 20:13 logmsgbot: thcipriani Purged l10n cache for 1.26wmf10
* 20:05 logmsgbot: thcipriani Finished scap: testwiki to php-1.26wmf12 and rebuild l10n cache (duration: 34m 58s)
* 19:41 ostriches: OAI: disabled unused accounts
* 19:30 logmsgbot: thcipriani Started scap: testwiki to php-1.26wmf12 and rebuild l10n cache
* 19:00 logmsgbot: demon Synchronized php-1.26wmf11/includes/WebResponse.php: rv my test (duration: 00m 12s)
* 18:55 logmsgbot: demon Synchronized php-1.26wmf11/includes/WebResponse.php: (no message) (duration: 00m 12s)
* 18:36 cmjohnson1: labcontrol1002 going down for a few minutes
* 18:33 mutante: tendril - short downtime for switch to new repo
* 18:17 gwicke: restarted cassandra on restbase1005 with g1gc GC and larger heap
* 18:16 gwicke: restarted cassandra on restbase1004 with g1gc GC and larger heap
* 17:02 akosiaris: enabled and ran puppet on lvs400X, lvs300X, lvs100[123]. noops
* 16:58 bblack: re-enabling puppet on caches
* 16:52 bblack: disabling puppet on cache clusters
* 16:48 akosiaris: enabled an ran puppet on all lvs servers @ codfw
* 16:22 akosiaris: enabled and ran puppet on lvs1004. noop as well
* 16:19 akosiaris: enabled and running puppet on lvs1005
* 16:11 akosiaris: enabling and running puppet on lvs1006
* 16:09 akosiaris: disabling puppet on all lvs and neon
* 16:07 gwicke: restarting cassandra instance on restbase1004
* 15:12 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Standardise a ton of ticket comments [[gerrit:221803]] (duration: 00m 13s)
* 15:04 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: CX: Enable CX all wikipedias except enwiki [[gerrit:221831]] (duration: 00m 13s)
* 14:46 kart_: Update cxserver to 0d21a80
* 14:10 mobrovac: restbase restarting cassandra on restbase1005
* 11:29 mobrovac: restbase restarting cassandra on restbase1005
* 10:41 mobrovac: restbase restarting on all nodes
* 09:54 mobrovac: restbase restarting cassandra on restbase1004
* 08:53 mobrovac: restbase restrting cassandra on restbase1004
* 08:05 jynus: applying schema changes for Gather extension
* 06:56 jynus: initiating query profiling on db1018
* 05:21 gwicke: restarting cassandra instance on restbase1004; was in small-write mode
* 05:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1034 (duration: 00m 12s)
* 04:37 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 30 04:37:00 UTC 2015 (duration 36m 59s)
* 02:22 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-30 02:22:00+00:00
* 02:18 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 06m 09s)
* 02:11 logmsgbot: krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 12s)
* 01:56 logmsgbot: krenair Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 11s)
* 01:41 logmsgbot: krinkle Synchronized php-1.26wmf11/includes/resourceloader/ResourceLoader.php: I7761242f01 (duration: 00m 14s)
* 00:37 godog: restbase1* upgrade to cassandra 2.1.7 completed


== June 29 ==
== 2021-07-04 ==
* 23:57 robh: mw2027 was offline (blank screen on serial console).  mgmt powercycled
* 17:43 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957{{!}}Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
* 23:48 godog: start upgrading restbase1* to cassandra 2.1.7
* 08:02 elukey: repool eqsin after equinix maintenance - [[phab:T286113|T286113]]
* 23:41 gwicke: restarted cassandra instance on restbase1004.eqiad; log showed many small writes and clients saw timeouts
* 23:29 gwicke: deployed restbase 32db4ce1e1
* 23:21 logmsgbot: ori Synchronized php-1.26wmf11/includes/resourceloader: I0e5f2d3b2: resourceloader: Add timing metrics for key operations (duration: 01m 12s)
* 23:15 logmsgbot: catrope Synchronized wmf-config/: wikitech cleanup (duration: 01m 08s)
* 23:11 RoanKattouw: ssh: connect to host mw2027.codfw.wmnet port 22: Connection timed out
* 23:11 RoanKattouw: Synced wmf-config/CommonSettings.php:  Remove survey access point in Popups
* 23:09 godog: stop ircecho on neon, icinga spam
* 22:53 gwicke: canary deploy of restbase 32db4ce1e1 on restbase1001.eqiad
* 21:30 urandom: restarting restbase1004 to apply new metrics reporting interval
* 20:19 subbu: deployed parsoid sha ea98be88
* 18:18 logmsgbot: ori Synchronized php-1.26wmf11/includes/db/LoadBalancer.php: I0e5f2d3b2: Use APC for caching slave lag times (duration: 01m 09s)
* 18:00 cmjohnson1: powering down ms-be1015
* 16:06 bblack: re-enabling puppet on caches
* 15:51 bblack: disabling puppet on caches temporarily ...
* 15:49 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/OpenStackManager: https://gerrit.wikimedia.org/r/#/c/221648/ (duration: 00m 13s)
* 15:29 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221405/ (duration: 00m 15s)
* 15:26 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221612/ (duration: 00m 12s)
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-2x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 14s)
* 15:24 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans-1.5x.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 15:23 logmsgbot: krenair Synchronized w/static/images/project-logos/zhwiki-hans.png: https://gerrit.wikimedia.org/r/#/c/221113/ (duration: 00m 12s)
* 15:20 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221009/ (duration: 00m 11s)
* 15:18 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/221047/ (duration: 00m 13s)
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.link.js: https://gerrit.wikimedia.org/r/#/c/221605 (duration: 00m 13s)
* 15:02 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/ContentTranslation/modules/tools/ext.cx.tools.formatter.js: https://gerrit.wikimedia.org/r/#/c/221604/ (duration: 00m 14s)
* 14:34 jynus: rebooting and reinstalling db1022
* 12:06 YuviPanda: restarting rsync with new exclusions file on labstore1002 to codfw
* 12:06 YuviPanda: excluded maps, mwoffliner and video project from rsync of broken FS to speed it up
* 11:59 YuviPanda: interupt rsync on labstore1001 to prevent it from copying mwofflienr files
* 11:00 _joe_: shutting down etcd1003, cleaning exported resources
* 10:32 _joe_: effectively removing etcd1003 from the cluster
* 10:17 _joe_: starting removal of etcd1003 from the etcd cluster
* 08:49 _joe_: joined conf1003 to the etcd cluster
* 08:20 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Depool db1022 for reinstall (duration: 00m 12s)
* 08:12 _joe_: adding conf1002 to the etcd cluster as a member
* 07:46 akosiaris: disabling ntp everywhere expect selected hosts in anticipation for the leap second
* 04:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 29 04:51:48 UTC 2015 (duration 51m 47s)
* 03:08 jgage: jmxtrans filled disks on all kafka brokers, 21GB log files. removed logs and restarted services.
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-29 02:23:47+00:00
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 53s)
* 00:52 springle: restart eventlogging auto-purge on m4
* 00:51 springle: restart replication on dbstore2002
* 00:00 springle: pausing replication on dbstore2002


== June 28 ==
== 2021-07-03 ==
* 23:51 logmsgbot: ori Synchronized php-1.26wmf11/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: I6ffdc977e87: Parse older format of Geo cookies (duration: 00m 13s)
* 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - [[phab:T286113|T286113]]
* 04:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 28 04:30:54 UTC 2015 (duration 30m 53s)
* 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for [[phab:T283659|T283659]]
* 02:20 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-28 02:20:52+00:00
* 08:53 Amir1: patching postorius and mailmanclient on lists1001 for [[phab:T283659|T283659]]
* 02:17 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 56s)


== June 27 ==
== 2021-07-02 ==
* 23:30 bd808: Deleted corrupt shards on logstash1004 and logstash1005. Recovery in process
* 22:06 foks: removing three files for legal compliance
* 20:12 ori: Delegated full access to Google Webmaster Tools for myself (olivneh@).
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 04:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 27 04:58:46 UTC 2015 (duration 58m 45s)
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 02:23 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-27 02:23:40+00:00
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:20 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 46s)
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 13:11 mutante: mw2380 - rebooting
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 12:24 moritzm: added btullis to pwstore
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 09:19 dcausse: restart blazegraph on wdqs1013
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 09:04 mutante: decom'ing mw1267
* 09:02 moritzm: installing node-hosted-git-info security updates
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 08:03 moritzm: installing ipmitool security updates
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])


== June 26 ==
== 2021-07-01 ==
* 23:57 bd808: Logstash log ingestion working again after forcing recovery of replicas for logstash-2015.06.26; new logs were being rejected with only a primary shard available
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 23:54 bd808: re-enabled allocation on logstash elasticsearch cluster
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 23:05 bblack: restarted gitblit on antimony, AGAIN
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 22:57 mutante: restarted gitblit
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 22:43 logmsgbot: catrope Synchronized php-1.26wmf11/extensions/Flow: Temporarily make subpages in Flow-occupied namespaces non-Flow again (duration: 00m 14s)
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 22:36 bd808: set indices.recovery.concurrent_streams to 4 on logstash ES cluster
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 22:36 godog: set indices.recovery.max_bytes_per_sec to 10mb on logstash ES cluster
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 22:25 godog: set indices.recovery.max_bytes_per_sec to 50mb on logstash ES cluster
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 22:25 jamesofur: Reset email address of User:Chwms identity verified in person at editathon
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 22:09 bd808: restarted logstash on logstash1001
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 21:10 urandom: taking xenon down to be rebootstrapped
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 20:10 bd808: Deleted 4 corrupt indices (logstash-2015.05.30 logstash-2015.05.31 logstash-2015.06.03 logstash-2015.06.06) on logstash1004
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 19:58 bd808: stopping elasticsearch on logstash1004 to cleanup corrupt shards
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 17:05 mutante: zirconium - manual cleanup, removing planet
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 17:04 godog: reverted cronolog puppetmaster patch, restarting apache
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 14:17 Krenair: Deployed patch for T103391
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 12:23 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/221105/ (duration: 00m 12s)
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 12:18 _joe_: added conf1001 to the etcd cluster
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 07:57 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/Popups: T103610 (duration: 00m 11s)
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 26 06:04:14 UTC 2015 (duration 4m 13s)
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 05:22 twentyafterfour: restarted apache on iridium to fix phabricator fatal
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 02:33 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-26 02:33:33+00:00
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 02:30 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 05m 36s)
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 00:51 gwicke: reverted restbase1001 canary to 90817c2a
* 14:58 papaul: poweroff mw2380 for disk replacement
* 00:36 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2: Updated mediawiki/core Project: mediawiki/extensions/SyntaxHighlight_GeSHi (duration: 00m 11s)
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 00:16 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: T102852 (duration: 00m 12s)
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 00:15 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-2x.png: T102852 (duration: 00m 13s)
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 00:14 logmsgbot: krinkle Synchronized w/static/images/project-logos/zhwiki-1.5x.png: T102852 (duration: 00m 12s)
* 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:05 logmsgbot: krinkle Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/modules/pygments.wrapper.css: I5d1510dc80d6d4712ca8411 (duration: 00m 12s)
* 14:45 moritzm: installing glib2.0 security updates on buster
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
* 13:03 marostegui: Deploy schema change on s2 eqiad master [[phab:T276150|T276150]]
* 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
* 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
* 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
* 12:23 tgr: EU deploys done
* 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
* 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
* 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
* 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
* 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400{{!}}Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
* 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
* 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 11:35 marostegui: Deploy schema change on s8 eqiad master [[phab:T276150|T276150]]
* 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
* 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851{{!}}Avoid using MWNamespace]] (duration: 01m 06s)
* 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:05 moritzm: installing remaining libgcrypt20 security updates
* 09:56 moritzm: installing remaining gnutls28 security updates
* 09:55 Amir1: start of clean up of autoreview logs in ruwiki ([[phab:T285608|T285608]])
* 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master [[phab:T277123|T277123]]
* 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master [[phab:T277123|T277123]]
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
* 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
* 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
* 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
* 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master [[phab:T277123|T277123]]
* 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters [[phab:T277123|T277123]]
* 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters [[phab:T277123|T277123]]
* 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) [[phab:T277123|T277123]]
* 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) [[phab:T277123|T277123]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
* 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8


== June 25 ==
== 2021-06-30 ==
* 23:53 mutante: planet1001 (ganeti) - signing puppet cert, initial run
* 23:28 urbanecm: Evening B&C window finished
* 23:31 mutante: apt-get upgrade on zirconium
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|667d88054097b195208818aee15bb1eb58955437}}: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
* 23:28 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 12s)
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
* 23:27 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/220847/ (duration: 00m 11s)
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
* 23:24 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: https://gerrit.wikimedia.org/r/#/c/220997/ (duration: 00m 13s)
* 21:43 Amir1: deleting auto-review logs from test2wiki ([[phab:T285608|T285608]])
* 23:20 gwicke: canary update of restbase on restbase1001 to 4b961f166 (deploy d1c4d9961)
* 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 23:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218926/ (duration: 00m 12s)
* 21:29 cstone: civicrm revision changed from {{Gerrit|789c92d13b}} to {{Gerrit|e07c2be1a7}}
* 23:11 logmsgbot: krenair Synchronized wmf-config/logging.php: https://gerrit.wikimedia.org/r/#/c/220784/ (duration: 00m 13s)
* 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 23:03 legoktm: fixed content models on lrcwiki for Module namespace
* 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
* 23:02 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220485/ (duration: 00m 16s)
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 22:02 logmsgbot: hoo Synchronized php-1.26wmf11/extensions/Wikidata/: Update Wikidata: Use SELECT FOR UPDATE in SqlIdGenerator (duration: 00m 20s)
* 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
* 21:29 godog: rm /var/lib/git/operations/puppet/modules/cassandra from labcontrol1001 labcontrol1002
* 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
* 21:10 godog: rm /var/lib/git/operations/puppet/modules/cassandra from rhodium
* 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
* 21:07 godog: rm /var/lib/git/operations/puppet/modules/cassandra from strontium and palladium
* 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
* 21:06 godog: push puppet.git after module/cassandra removal T92560
* 17:57 thcipriani: restart ci jenkins following upgrade
* 20:41 mutante: deleted SVN monitor from watchmouse
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 20:18 mutante: bye SVN - subversion URLs now redirect to phab or doc
* 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci [[phab:T285532|T285532]]
* 20:08 logmsgbot: nikerabbit Finished scap: T103888 CX aliases (duration: 22m 37s)
* 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # [[phab:T285866|T285866]]
* 19:46 logmsgbot: nikerabbit Started scap: T103888 CX aliases
* 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
* 18:09 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf11
* 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
* 17:46 logmsgbot: krenair Synchronized wmf-config: (no message) (duration: 00m 31s)
* 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 20s)
* 17:43 logmsgbot: krenair Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 16s)
* 17:43 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218098/ (duration: 00m 12s)
* 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 17:18 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: Ieab6b1473e6ce: תיקון טעות (duration: 00m 12s)
* 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource ([[phab:T284389|T284389]])
* 15:59 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/219599/ (duration: 00m 12s)
* 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 15:57 logmsgbot: krenair Synchronized wmf-config/CommonSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/217539/ - noop for prod, labs only part (duration: 00m 12s)
* 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 14s)
* 15:56 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/217539/ (duration: 00m 13s)
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 13s)
* 15:51 logmsgbot: krenair Synchronized wmf-config/flaggedrevs.php: https://gerrit.wikimedia.org/r/#/c/203370/ (duration: 00m 12s)
* 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 15:49 logmsgbot: krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/218539/ (duration: 00m 15s)
* 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 15:32 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/220068/ - noop for prod, just labs (duration: 00m 12s)
* 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 13s)
* 15:30 logmsgbot: krenair Synchronized commonsuploads.dblist: https://gerrit.wikimedia.org/r/#/c/220715/ (duration: 00m 12s)
* 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 15s)
* 15:24 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220747/ (duration: 00m 12s)
* 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki ([[phab:T284885|T284885]])
* 15:16 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220408/ (duration: 00m 12s)
* 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:12 logmsgbot: krenair Synchronized php-1.26wmf11/extensions/SemanticForms/includes/SF_AutoeditAPI.php: https://gerrit.wikimedia.org/r/#/c/220765/ (duration: 00m 12s)
* 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:04 logmsgbot: krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/220706/ (duration: 00m 12s)
* 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 15:02 logmsgbot: krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/220653/ (duration: 00m 12s)
* 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 12s)
* 13:30 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: Repool es2003 (but not es2004) after maintenance (duration: 00m 12s)
* 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 14s)
* 10:57 jynus: rebooting es2003 and es2004
* 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 10:40 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2003 and es2004 for maintenance (duration: 00m 13s)
* 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 10:09 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: repool db1018 (duration: 00m 12s)
* 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki ([[phab:T284450|T284450]])
* 09:02 jynus: restarting mysqld on db1018
* 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # [[phab:T284450|T284450]]
* 08:42 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: depool db1018 for maintenance (duration: 00m 13s)
* 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 08:33 logmsgbot: ori Synchronized php-1.26wmf11/resources/src/mediawiki.skinning/elements.css: I0e5f2d3b2: Wrap lines in <nowiki><pre></nowiki> and .mw-code by default (duration: 00m 12s)
* 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 13s)
* 06:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 25 06:59:13 UTC 2015 (duration 59m 12s)
* 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 04:04 ori: restarted apache2 on palladium
* 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
* 03:11 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-25 03:11:01+00:00
* 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
* 03:04 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 19s)
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 02:40 bblack: puppet re-enabled on caches
* 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 02:37 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-25 02:37:44+00:00
* 13:26 moritzm: installing fluidsynth security updates on stretch
* 02:34 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 06m 44s)
* 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 02:04 bblack: disabling puppet on cp* caches for patch-testing
* 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 00:43 awight: update crm from bd8a00196071ddd04efbff7b30567dd9357c9000 to e923225e423948bd70440e2d1131460b10cefac1
* 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 00:38 godog: upgrade cassandra to 2.1.7 on restbase1008
* 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 00:30 twentyafterfour: phabricator upgrade completed
* 13:04 mutante: switching docker-registry to nginx light variant [[phab:T164456|T164456]]
* 00:28 godog: upgrade cassandra to 2.1.7 on restbase1004
* 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 00:12 legoktm: <twentyafterfour> Phabricator upgrade happening now. Will be down for a few minutes.
* 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 12:17 kart_: Updated cxserver to 2021-06-30-112813-production ([[phab:T284900|T284900]], [[phab:T284885|T284885]])
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 11:46 Lucas_WMDE: EU backport+config window done
* 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 01m 16s)
* 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
* 11:11 moritzm: installing libgcrypt security updates on buster
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701504{{!}}Stop setting Wikibase client repoConceptBaseUri (T257260)]] (duration: 01m 24s)
* 10:44 moritzm: installing gnutls security updates on buster
* 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
* 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - [[phab:T162123|T162123]]
* 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
* 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
* 08:31 godog: remove sdf1 from thanos-be1003 in swift - [[phab:T285835|T285835]]
* 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
* 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
* 07:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 05:46 ryankemper: [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw`
* 00:36 tstarling@deploy1002: Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s)
* 00:35 tstarling@deploy1002: Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 00:34 tstarling@deploy1002: Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 00:32 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s)
* 00:31 tstarling@deploy1002: Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 00:27 tstarling@deploy1002: Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s)
* 00:01 urbanecm: (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of [[phab:T285819|T285819]]


== June 24 ==
== 2021-06-29 ==
* 23:18 logmsgbot: rmoen Synchronized wmf-config/mobile.php: Enable browse experiment on test and enwiki (duration: 00m 14s)
* 23:45 urbanecm: Evening B&C window done
* 23:17 logmsgbot: rmoen Synchronized wmf-config/InitialiseSettings.php: Enable browse experiment on test and enwiki (duration: 00m 12s)
* 23:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|367bc98}}: {{Gerrit|904d18720}}: flood flag changes for enwikibooks ([[phab:T285594|T285594]]) (duration: 01m 07s)
* 23:13 urandom: rolling restart of Cassandra staging cluster
* 23:45 urbanecm: Remove TrainBranchBot from wmf-deployment Gerrit group, merges code to mediawiki-config without actually deploying it
* 23:04 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/CentralAuth: https://gerrit.wikimedia.org/r/#/c/220637/ (duration: 00m 13s)
* 23:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: {{Gerrit|8a5b835050cc0f6d47b6fde2317db8743fcb9ce0}}: SpecialEditGrowthConfig: Do not use relative => true ([[phab:T285750|T285750]]) (duration: 01m 04s)
* 23:03 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/UserMerge: https://gerrit.wikimedia.org/r/#/c/220638/ (duration: 00m 13s)
* 23:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: {{Gerrit|c61fb175c82accda526105cc32457b07530d09fa}}: SpecialEditGrowthConfig: Do not use relative => true ([[phab:T285750|T285750]]) (duration: 01m 05s)
* 22:32 mutante: zirconium - stop using 443 at all, rm NameVirtualHost *:443
* 23:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/: {{Gerrit|bad82665f8bea667aff049794612c270063c7519}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 05s)
* 22:30 mutante: zirconium - deleting unused apache configs, bugzilla, etherpad, ...
* 23:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: {{Gerrit|bad82665f8bea667aff049794612c270063c7519}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 06s)
* 21:09 godog: start cassandra on restbase1008
* 23:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: {{Gerrit|e77e002130a7815570ff967013757bacc7037fb0}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 09s)
* 18:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to 1.26wmf11
* 21:58 maryum: deployed security patch [[phab:T285515|T285515]] to wmf.12
* 18:02 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/Flow/includes/Specials/SpecialEnableFlow.php: https://gerrit.wikimedia.org/r/#/c/220514/ (duration: 00m 15s)
* 21:51 maryum: deployed security patch [[phab:T285515|T285515]] to wmf.11
* 17:24 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: repool es2001 and es2002 after maintenance (duration: 00m 13s)
* 21:44 maryum: deployed updated security patch for [[phab:T285190|T285190]] to wmf.12
* 17:05 thcipriani: scap completed with the exception of snapshot1001 that's disk is full
* 21:42 maryum: deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 17:04 logmsgbot: thcipriani scap failed: OSError [Errno 2] No such file or directory: '/var/lock/scap' (duration: 41m 33s)
* 21:31 sbassett: Reverted and deployed updated security patch for [[phab:T285190|T285190]] to wmf.12
* 16:22 logmsgbot: thcipriani Started scap: SWAT: Automatically add to shell group when adding to a project [[gerrit:220468]]
* 21:29 sbassett: Reverted and deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 16:10 logmsgbot: ori Synchronized php-1.26wmf11/includes/page/Article.php: I0e5f2d3b2: Revert r47388 / 8d9243cf3: Use Title::getLocalURL() for rel=canonical links (duration: 00m 13s)
* 21:19 sbassett: Deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 15:57 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Revert Enable browse prototype on test- and enwiki (duration: 00m 15s)
* 20:55 dancy: Deleted all CDB files on beta so they'll be recreated on the next scap sync-world run
* 15:49 jynus: rebooting es2001 and es2002
* 20:26 dancy: Reverting to scap 3.17.1-1+0~20210419163335.8~1.gbpa6b2e0 in beta
* 15:44 logmsgbot: thcipriani Synchronized wmf-config: SWAT: Enable browse prototype on test- and enwiki [[gerrit:219451]] (duration: 00m 12s)
* 19:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
* 15:24 logmsgbot: thcipriani Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable ContentTranslation in testwiki [[gerrit:220385]] (duration: 00m 12s)
* 19:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
* 15:17 logmsgbot: thcipriani Synchronized php-1.26wmf11/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 12s)
* 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
* 15:14 andrewbogott: disabled puppet on labcontrol1001 to hotfix https://gerrit.wikimedia.org/r/#/c/220476/
* 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
* 15:08 logmsgbot: thcipriani Synchronized php-1.26wmf10/extensions/ContentTranslation: SWAT: Enable publish button when the preference is not to use initial translation (duration: 00m 13s)
* 19:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
* 14:53 logmsgbot: jynus Synchronized wmf-config/db-codfw.php: depool es2001 and es 2002 for maintenance (duration: 00m 13s)
* 19:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
* 14:12 logmsgbot: krenair Synchronized php-1.26wmf10/extensions/SemanticForms/includes/SF_AutoeditAPI.php: T103653 live hack (duration: 00m 13s)
* 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
* 10:44 _joe_: restarting jmxtrans on analytics1021
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
* 10:31 jgage: restarting kafka on analytics1021
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12
* 10:10 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Switchover master es1008 -> es1009 (duration: 00m 12s)
* 18:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 09:24 hashar: removing java 6 from gallium and lanthanum https://phabricator.wikimedia.org/T103491
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 09:17 hashar: apt-get upgrade on gallium and lanthanum
* 18:34 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc3
* 09:16 jynus: performing a master failover of es1008 into es1009
* 18:28 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
* 08:27 logmsgbot: jynus Synchronized wmf-config/db-eqiad.php: Repool es1004 (duration: 00m 14s)
* 18:21 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc1
* 05:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 24 05:46:32 UTC 2015 (duration 46m 31s)
* 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 05:12 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1045 (duration: 00m 13s)
* 18:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 05:03 jgage: removed old logs and did 'apt-get clean' on analytics1021 to make space
* 18:07 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.7 (duration: 04m 00s)
* 03:00 logmsgbot: LocalisationUpdate completed (1.26wmf11) at 2015-06-24 03:00:45+00:00
* 17:59 urbanecm: Start server-side upload of ~2.5G of JPG files ([[phab:T282755|T282755]])
* 02:54 logmsgbot: l10nupdate Synchronized php-1.26wmf11/cache/l10n: (no message) (duration: 10m 34s)
* 17:52 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.12 (duration: 57m 11s)
* 02:28 logmsgbot: LocalisationUpdate completed (1.26wmf10) at 2015-06-24 02:28:16+00:00
* 16:55 ryankemper: [[phab:T281327|T281327]] `[Cirrus -> codfw]` Current banned nodes are`elastic2043` and `elastic2045`; `elastic2043` can be unbanned after a re-image, and `elastic2045` can be unbanned in ~30 minutes after shards rebalance (had heavy shards scheduled)
* 02:24 logmsgbot: l10nupdate Synchronized php-1.26wmf10/cache/l10n: (no message) (duration: 07m 21s)
* 16:55 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.12
* 01:39 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: I0e5f2d3b2 (duration: 00m 13s)
* 16:45 brennen: 1.37.0-wmf.12 was branched at {{Gerrit|3703c3194b590a1fcccb485245022eac369d2b69}} for [[phab:T281153|T281153]]
* 01:01 gwicke: rolling restart of cassandra instances to rule out a single node in funky state causing elevated p99 latency
* 16:28 ebernhardson: temporarily ban elastic2045 from production-search-codfw
* 00:43 ori: experimenting with httpd on mw1041 again
* 15:43 dcausse: unbanning elastic2054
* 00:19 gwicke: rolling restart of restbase instances to rule out backend connections as a source for high p99 latencies
* 15:30 dcausse: restarting blazegraph on wdqs1012
* 00:14 ori: experimenting with HHVM shutdown via /stop on the admin server on mw1041
* 15:17 effie: pool mw2383 back
* 15:15 mutante: [mwlog2002:~] $ sudo systemctl start mw-log-cleanup
* 15:06 dcausse: banning elastic2054
* 14:53 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[1-2].codfw.wmnet,service=canary
* 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[8-9].codfw.wmnet,service=canary
* 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw225[1-2].codfw.wmnet,service=canary
* 14:52 effie: depool mw2383 as it is misbehaving
* 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 14:47 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw226[1-2].codfw.wmnet
* 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2290.codfw.wmnet
* 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 14:46 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw22[7-8][0-9].codfw.wmnet
* 14:45 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet
* 14:44 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:44 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet,service=api_appserver
* 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 14:38 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 14:38 _joe_: restarting pohp-fpm on mw2383
* 14:38 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:37 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2103 (s1) weight a bit', diff saved to https://phabricator.wikimedia.org/P16739 and previous config saved to /var/cache/conftool/dbconfig/20210629-143742-marostegui.json
* 14:37 _joe_: repooling mw2383
* 14:36 _joe_: depooling mw2383
* 14:30 legoktm@deploy1002: Synchronized wmf-config/db-codfw.php: fix trwikivoyage (duration: 01m 01s)
* 14:29 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 14:28 Krinkle: TODO: Don't duplicate `sectionsByDB` between db-* files
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 14:23 jayme@cumin1001: MediaWiki read-only period ends at: 2021-06-29 14:23:23.504447
* 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 14:21 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 14:21 jayme@cumin1001: MediaWiki read-only period starts at: 2021-06-29 14:21:26.671853
* 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 14:15 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 14:15 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 44 hosts with reason: DC switchover
* 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 44 hosts with reason: DC switchover
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:11 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:10 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:09 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:08 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 14:02 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:01 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:01 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 13:51 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@edc31a2]
* 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2] (duration: 00m 07s)
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2]
* 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 17m 42s)
* 13:35 volker-e@deploy1002: Finished deploy [design/style-guide@e97fccb]: Deploy design/style-guide: {{Gerrit|e97fccb}} styles: Add internationalization and accessibility note labels and treatments (#476) (duration: 00m 07s)
* 13:34 volker-e@deploy1002: Started deploy [design/style-guide@e97fccb]: Deploy design/style-guide: {{Gerrit|e97fccb}} styles: Add internationalization and accessibility note labels and treatments (#476)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
* 11:54 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702095{{!}}vector: Finish enabling language switcher treatment A/B test on fawiki (T269093)]] (duration: 00m 56s)
* 11:38 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]], Part II (duration: 00m 58s)
* 11:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/includes/Rdf/PropertyStubRdfBuilder.php: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]], Part I (duration: 00m 56s)
* 11:35 ladsgroup@deploy1002: sync-file aborted: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]] (duration: 00m 10s)
* 10:30 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on acmechief* after switch towards nginx-light [[phab:T164456|T164456]]
* 09:27 moritzm: installing nettle security updates on buster
* 08:47 elukey: repool mw13[55,84] after debugging - [[phab:T285634|T285634]]
* 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
* 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 08:43 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 08:25 elukey: cumin 'A:mw-eqiad' '/usr/local/sbin/restart-php7.2-fpm' -b 2 -s 30 - [[phab:T285634|T285634]]
* 08:21 elukey: depool mw1355 (mw appserver) for debugging - [[phab:T285634|T285634]]
* 08:21 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
* 08:12 hashar: Upgrading Jenkins on contint2001 / contint1001 and restarting CI Jenkins # [[phab:T285531|T285531]]
* 08:03 hashar: Upgraded Jenkins on releases1002 / releases2002 # [[phab:T285531|T285531]]
* 08:02 hashar: Upgraded Jenkins on releases1002 / releases2002
* 07:50 godog: remove 20G migration data /root/prometheus from prometheus4001 - [[phab:T243057|T243057]]
* 07:48 godog: remove old /root/prometheus data from prometheus4001
* 07:05 moritzm: upgrading bullseye early installs to the latest state of testing [[phab:T275873|T275873]]
* 06:46 tstarling@deploy1002: Synchronized php-1.37.0-wmf.11/includes/MediaWiki.php: Add statsd action timing metric [[phab:T284274|T284274]] (duration: 00m 58s)
* 02:47 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload and A:codfw' 'run-puppet-agent -q'
* 02:34 ryankemper: [[phab:T285643|T285643]] Banned `elastic1039` from all 3 elasticsearch clusters and set `elastic1039.eqiad.wmnet` to failed in netbox
* 02:27 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload' 'run-puppet-agent -q'
* 02:25 eileen: civicrm revision changed from {{Gerrit|927ab7cff7}} to {{Gerrit|789c92d13b}}, config revision is {{Gerrit|1739c53fcb}}
* 02:04 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0e916b1]: 0.3.75 (duration: 08m 40s)
* 01:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.75` on canary `wdqs1003`; proceeding to rest of fleet
* 01:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0e916b1]: 0.3.75
* 01:50 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.75`. Pre-deploy tests passing on canary `wdqs1003`
* 00:25 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref [[phab:T282761|T282761]]


== June 23 ==
== 2021-06-28 ==
* 23:38 logmsgbot: ori Finished scap: scapping to all apaches for --restart test (duration: 07m 03s)
* 23:07 urbanecm: Evening B&C window done
* 23:30 logmsgbot: ori Started scap: scapping to all apaches for --restart test
* 23:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ec855d14b31a9392274c2bfe2e21e2ad44986bc}}: Enable Parsoid inspired media structure on test wikis ([[phab:T51097|T51097]]) (duration: 00m 59s)
* 23:24 bblack: nginxes all updated for ssl stapling bugfix
* 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 23:24 logmsgbot: ori Finished scap: scapping to scap-test dsh group for --restart test (duration: 06m 02s)
* 22:51 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 23:18 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 22:50 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 23:16 logmsgbot: ori scap aborted: scapping to scap-test dsh group for --restart test (duration: 00m 06s)
* 22:48 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 23:16 logmsgbot: ori Started scap: scapping to scap-test dsh group for --restart test
* 22:48 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 22:14 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: RejectParserCacheValue may pass a WikiPage or Article (duration: 00m 13s)
* 22:44 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 22:07 mutante: tmp. disabling puppet on mw1033
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 21:53 logmsgbot: legoktm Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.class.php: (no message) (duration: 00m 15s)
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 21:50 logmsgbot: ori Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 12s)
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 21:40 mutante: starting instance planet1001 on ganeti1003 - cant get console
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 21:40 logmsgbot: legoktm Synchronized php-1.26wmf11/includes/parser/ParserCache.php: (no message) (duration: 00m 13s)
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 21:36 bd808: updated scap to 33f3002 (Ensure that the minimum batch size used by cluster_ssh is 1)
* 22:43 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-06-28 22:43:04.512602
* 21:34 logmsgbot: ori Synchronized php-1.26wmf11/extensions/SyntaxHighlight_GeSHi: 3c8bb2c493: Update SyntaxHighlight_GeSHi for cherry-pick (duration: 00m 13s)
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 20:32 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.26wmf11
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 20:19 logmsgbot: mattflaschen Synchronized wmf-config/InitialiseSettings-labs.php: Beta-only change to add Flow_test to enwiki (duration: 00m 11s)
* 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 19:59 logmsgbot: ori scap failed: