You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T245202 cirrus: Move all move_like traffic to codfw (duration: 01m 02s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(480 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-02-14 ==
== 2021-08-03 ==
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245202|T245202]] cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: [[phab:T245202|T245202]] cirrus: Increase the pool counter limits a bit (duration: 01m 05s)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-02-13 ==
== 2021-08-02 ==
* 22:13 jeh: running filesystem tests on cloudvirt1024 [[phab:T241884|T241884]]
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:40 jbond42: refresh facts on compilers
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying the 'logging-external' release) (safe because eventgate-logging-external is not in use)  - [[phab:T245203|T245203]]
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 20:33 marxarelli: rollback to group1 due to 500 spike (2k/min) ([[phab:T233867|T233867]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 20:32 dduvall@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 20:30 marxarelli: varnish 500 spike. rolling back
* 21:31 tzatziki: removing 1 file for legal compliance
* 20:20 gehel: restarting blazegraph + updater on wdqs2006
* 21:16 tzatziki: removing 7 files for legal compliance
* 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.19
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/api/ApiRollback.php: [[phab:T245159|T245159]] ApiRollback: Properly deal with UserIdentity (duration: 01m 04s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245182|T245182]] ResourceLoaderSkinModule: Don't hard-deprecate wgLogoHD just now (duration: 01m 03s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T219534|T219534]] Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 moritzm: installing e2fsprogs security updates
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 18:48 bblack: ns1.wikimedia.org - re-routing back to authdns2001 instead of dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 19:00 urbanecm: Morning B&C window completed
* 18:38 bblack: authdns2001 - reboot - [[phab:T242017|T242017]]
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:36 bblack: ns1.wikimedia.org - re-routing from authdns2001 to dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:09 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I9d0c8af3c577}} (duration: 01m 06s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:00 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iae1f45896}} (duration: 01m 06s)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:59 volans: downtimed mgmt in eqiad for 1h
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 17:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iae1f45896}} (duration: 01m 08s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ibfca686f681}} (duration: 01m 06s)
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:41 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iefff596955e}} (duration: 01m 08s)
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 17:40 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iefff596955e}} (duration: 01m 06s)
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:35 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:32 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 17:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op (code style only) deploy sync (duration: 01m 07s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 17:09 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php No-op (code style only) deploy sync (duration: 00m 04s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 17:09 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php No-op (code style only) deploy sync
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 16:32 robh: ps1-a8-codfw.mgmt.codfw.wmnet firmware upgraded via [[phab:T245164|T245164]]
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:28 papaul: rebooting elastic2043 for firmware upgrade
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 16:22 gehel: canceled the restart of elastic2043 - [[phab:T243715|T243715]]
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:21 gehel: restarting elastic2043 - [[phab:T243715|T243715]]
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 16:10 _joe_: depooling/repooling mw1240
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 16:02 _joe_: pooled mw1238 again
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:59 _joe_: depooling mw1238 for analysis
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:42 vgutierrez: rolling restart of ats-be on esams - [[phab:T170567|T170567]]
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 15:38 vgutierrez: disable allow_half_open on ats-tls @ cp4031 - [[phab:T236458|T236458]]
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 15:27 vgutierrez: turning on TLSv1.3 between ats-be and applayer in cp30[51-52] - [[phab:T170567|T170567]]
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 15:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikibaseMediaInfo/resources/: UBN fix: Force non-value to be undefined (duration: 01m 06s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 14:51 vgutierrez: test TLSv1.3 between ats-be and applayer in cp3050 - [[phab:T170567|T170567]]
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:47 XioNoX: re-image rpki2001 - [[phab:T244585|T244585]]
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:33 XioNoX: add routinator_0.6.4_amd64.deb to buster-wikimedia apt repo
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10405 and previous config saved to /var/cache/conftool/dbconfig/20200213-142735-marostegui.json
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:24 XioNoX: re-enable ping offload in esams - [[phab:T244584|T244584]]
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:31 XioNoX: disable ping offload in esams - [[phab:T244584|T244584]]
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:24 XioNoX: re-enable ping offload in eqiad - [[phab:T244584|T244584]]
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:06 XioNoX: disable ping offload in eqiad - [[phab:T244584|T244584]]
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:03 XioNoX: re-enable ping offload in codfw - [[phab:T244584|T244584]]
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 13:00 vgutierrez: pool cp10[75,76] running buster - [[phab:T242093|T242093]]
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:20 mutante: gerrit servers: disabling puppet
* 12:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 12:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 12:34 Amir1: EU SWAT is done
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]], take II, the cache issue ([[phab:T219123|T219123]] [[phab:T225055|T225055]]) (duration: 01m 03s)
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]] (duration: 01m 03s)
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 12:29 vgutierrez: depool cp10[75,76] and reimage as buster - [[phab:T242093|T242093]]
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 12:28 vgutierrez: pool cp10[77,78] running buster - [[phab:T242093|T242093]]
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571956{{!}}Revert: Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 11:27 hashar: restarting Jenkins on contint2001
* 12:18 XioNoX: re-image ping2001 to buster - [[phab:T244584|T244584]]
* 11:27 hashar: restarting Jenkins on contint1001
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1c81925}}: Create Test Custodians group at Beta Wikiversity ([[phab:T240438|T240438]]) (duration: 01m 07s)
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 XioNoX: disable ping offload in codfw
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 12:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 12:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0f035e4}}: Update wgAvailableRights declaration of autoreviewprotected ([[phab:T230103|T230103]]) (duration: 01m 03s)
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:13 urbanecm: EU B&C window completed
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|176b0e8}}: Grant autopatrol to azwiki patrollers ([[phab:T244338|T244338]]) (duration: 01m 05s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:53 vgutierrez: depool cp10[77,78] and reimage as buster - [[phab:T242093|T242093]]
* 11:08 moritzm: installing openjdk-11 security updates
* 11:52 vgutierrez: pool cp10[79,80] running buster - [[phab:T242093|T242093]]
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 11:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:24 moritzm: installing libsndfile security updates on buster
* 11:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:12 moritzm: installing aspell security updates
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 vgutierrez: rolling upgrade of ATS to version 8.0.5-1wm16 fleet wide - [[phab:T244464|T244464]]
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 vgutierrez: depool cp10[79,80] and reimage as buster - [[phab:T242093|T242093]]
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 11:12 ema: A:cp re-enable puppet, leave it to cron to apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 11:08 vgutierrez: upload trafficserver 8.0.5-1wm16 to apt.wm.o (buster) - [[phab:T244464|T244464]]
* 11:02 vgutierrez: pool cp10[81,82] and reimage as buster - [[phab:T242093|T242093]]
* 10:59 ema: cp4021 (cache_upload): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:49 ema: cp4027 (cache_text): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 10:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 vgutierrez: removing /root/.ssh/known_hosts in cumin1001
* 10:21 vgutierrez: pool cp10[83,84] running buster - [[phab:T242093|T242093]]
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 vgutierrez: depool cp10[83,84] and reimage as buster - [[phab:T242093|T242093]]
* 09:45 vgutierrez: pool cp10[85,86] running buster - [[phab:T242093|T242093]]
* 09:10 moritzm: installing Java security updates on elastic* and relforge*
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10403 and previous config saved to /var/cache/conftool/dbconfig/20200213-085957-marostegui.json
* 08:57 gehel: restart elasticsearch on elastic2051 - JVM upgrade
* 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:57 moritzm: installing Java security updates on Hadoop, Kafka/Jumbo, AQS and Druid canaries
* 07:57 vgutierrez: depool cp10[85,86] and reimage as buster - [[phab:T242093|T242093]]
* 07:53 moritzm: rolling restart of restbase-dev to pick up Java security update
* 07:49 vgutierrez: pool cp10[87,88] running buster - [[phab:T242093|T242093]]
* 07:49 vgutierrez: testing ATS 8.0.5-1wm16 + KA between ats-tls and varnish-fe in cp4031 - [[phab:T244464|T244464]]
* 07:47 moritzm: installing Java security updates on stat/SWAP hosts
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10402 and previous config saved to /var/cache/conftool/dbconfig/20200213-072839-marostegui.json
* 07:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:03 vgutierrez: depool cp10[87,88] and reimage as buster - [[phab:T242093|T242093]]
* 07:02 vgutierrez: pool cp10[89,90] running buster - [[phab:T242093|T242093]]
* 06:49 vgutierrez: pool cp20[02,05] running buster - [[phab:T242093|T242093]]
* 06:36 marostegui: Upgrade and compress db1087, this will generate lag on s8 on the wiki replicas - [[phab:T232446|T232446]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10401 and previous config saved to /var/cache/conftool/dbconfig/20200213-063535-marostegui.json
* 06:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1099:3318 into vslow for s8 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10400 and previous config saved to /var/cache/conftool/dbconfig/20200213-063334-marostegui.json
* 06:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10399 and previous config saved to /var/cache/conftool/dbconfig/20200213-063207-marostegui.json
* 06:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10398 and previous config saved to /var/cache/conftool/dbconfig/20200213-062642-marostegui.json
* 06:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10397 and previous config saved to /var/cache/conftool/dbconfig/20200213-062148-marostegui.json
* 06:19 vgutierrez: testing a new build of ATS 8.0.6 in cp40[26,32]
* 06:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10396 and previous config saved to /var/cache/conftool/dbconfig/20200213-061219-marostegui.json
* 06:11 vgutierrez: depool cp10[89,90] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: depool cp20[02,05] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: pool cp20[01,08] running buster - [[phab:T242093|T242093]]
* 06:02 twentyafterfour: set phabricator read-only to false
* 06:01 twentyafterfour: set phabricator read-only
* 06:00 marostegui: Start phabricator maintenance [[phab:T244566|T244566]]
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:53 marostegui: Upgrade db1128 without restarting mysql - [[phab:T244566|T244566]]
* 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:47 marostegui: Silence m3 hosts for maintenance - [[phab:T244566|T244566]]
* 05:38 vgutierrez: depool cp2008 and reimage as buster - [[phab:T242093|T242093]]
* 05:37 vgutierrez: pool cp2011 running buster - [[phab:T242093|T242093]]
* 05:35 vgutierrez: depool cp2001 and reimage as buster - [[phab:T242093|T242093]]
* 05:34 vgutierrez: pool cp2004 running buster - [[phab:T242093|T242093]]
* 05:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:09 vgutierrez: depool cp20[04,11] and reimage as buster - [[phab:T242093|T242093]]
* 03:57 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:57 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:32 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:10 twentyafterfour: no apparent problems with phabricator upgrade, all done
* 01:01 twentyafterfour: starting phabricator deploy, momentary downtime expected while apache restarts
* 00:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:45 niharika29@deploy1001: Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - [[phab:T244488|T244488]] (duration: 01m 07s)
* 00:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-12 ==
== 2021-07-31 ==
* 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
* 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:53 chaomodus: restart nagios-nrpe-service on cumin1001 after it had oomed
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 21:10 marxarelli: completed group1 to 1.35.0-wmf.19
* 21:00 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.19 (duration: 01m 03s)
* 20:59 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.19
* 20:49 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232563|T232563]] - Remove SERVER_SOFTWARE override (duration: 01m 03s)
* 20:39 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T72470|T72470]] - Disable wgLegacyJavaScriptGlobals on svwiki (duration: 01m 08s)
* 19:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Don't use hex escapes in the name of cawiki (duration: 01m 04s)
* 19:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243503|T243503]] [itwiki] Move assignment of 'mover' group from sysops to bureaucrats (duration: 01m 02s)
* 19:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243509|T243509]] [zh_classicalwiki] Enable new user message for auto-created accounts (duration: 01m 03s)
* 19:38 James_F: Ran mwscript maintenance/namespaceDupes.php --wiki=mywiki --fix and mwscript maintenance/namespaceDupes.php --wiki=mywiktionary --fix on mwmaint1002
* 19:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244980|T244980]] Localise $wgMetaNamespace for mywiki and mywiktionary (duration: 01m 03s)
* 19:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244205|T244205]] [newiki] Set local timezone to Kathmandu (duration: 01m 03s)
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T241883|T241883]] [fywiktionary] Set a local wgSitename (duration: 01m 03s)
* 19:12 jforrester@deploy1001: Synchronized wmf-config/throttle-analyze.php: Replace deprecated IP class with IPUtils (no-op sync) (duration: 01m 03s)
* 18:31 mutante: irc2001 - manually run the "$<nowiki>{</nowiki>v6_token_cmd<nowiki>}</nowiki> && $<nowiki>{</nowiki>v6_flush_dyn_cmd<nowiki>}</nowiki>" commands from interface::add_ip6_mapped to debug 'Interface::Add_ip6_mapped[main]/Augeas[ens5_v6_token]: Could not evaluate: Saving failed' but it does not reproduce the puppet error ... ([[phab:T244719|T244719]])
* 17:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/pager/IndexPager.php: [[phab:T244941|T244941]] IndexPager: Cast properties passed to implode to arrays (duration: 01m 03s)
* 17:27 jeh: upgrade RAID firmware on cloudvirt1024 to 25.5.6.0009 [[phab:T241884|T241884]]
* 17:22 bblack: ns1.wikimedia.org - re-route back to original authdns2001 destination
* 17:11 brennen: restarting jenkins for updates
* 17:09 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 17:01 vgutierrez: rolling back cp4026 and cp4032 to trafficserver 8.0.5-1wm15
* 17:00 vgutierrez: depool cp40[26,32]
* 16:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:52 vgutierrez: pool cp20[06,14] running buster - [[phab:T242093|T242093]]
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 moritzm: installing openjpeg2 security updates
* 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 vgutierrez: Enable KA and disable parent proxies on cp4031 - [[phab:T244464|T244464]]
* 15:50 vgutierrez: depool cp20[06,14] and reimage as buster - [[phab:T242093|T242093]]
* 15:49 volans: spicerack upgraded to 0.0.30-1 on both cumin hosts
* 15:48 vgutierrez: pool cp20[07,17] running buster - [[phab:T242093|T242093]]
* 15:46 bblack: authdns2001 - shutting down for hardware work - [[phab:T242017|T242017]]
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:39 jeh: clearing foreign drive RAID configuration on cloudvirt1024 [[phab:T241884|T241884]]
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga - [[phab:T244958|T244958]]
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga -
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:25 jeh: upgrade BIOS firmware on cloudvirt1024 to 2.4.8 [[phab:T241884|T241884]]
* 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:02 vgutierrez: depool cp20[07,17] and reimage as buster - [[phab:T242093|T242093]]
* 14:34 XioNoX: repool eqsin
* 14:31 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 14:00 vgutierrez: pool cp20[10,18] running buster - [[phab:T242093|T242093]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10393 and previous config saved to /var/cache/conftool/dbconfig/20200212-135514-marostegui.json
* 13:39 akosiaris: revert sessionstore on mw1331, mw1348 so that it times out instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:36 XioNoX: re-enable transit/peering on cr1-eqsin - [[phab:T244944|T244944]]
* 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:23 akosiaris: mangle sessionstore on mw1331, mw1348 so that it timesout instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 XioNoX: cr1-eqsin RE failover (final) - [[phab:T244944|T244944]]
* 13:21 marostegui: Restart wikibugs as phab comments aren't showing up on irc - [[phab:T241109|T241109]]
* 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:18 jynus: setting up db1140 under maintenance (upgrade, reboot, disable alerts)
* 13:15 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 13:10 moritzm: upgrading debdeploy fleet-wide to 0.0.99.13
* 13:08 moritzm: uploaded libapache2-mod-auth-cas 1.2-1~deb8u1 for jessie-wikimedia to apt.wikimedia.org
* 13:05 vgutierrez: depool cp20[10,18] and reimage as buster - [[phab:T242093|T242093]]
* 13:05 vgutierrez: pool cp20[12,20] running buster - [[phab:T242093|T242093]]
* 12:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 XioNoX: cr1-eqsin RE failover - [[phab:T244944|T244944]]
* 12:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:35 vgutierrez: depool cp20[12,20] and reimage as buster - [[phab:T242093|T242093]]
* 12:34 vgutierrez: pool cp20[13,22] running buster - [[phab:T242093|T242093]]
* 12:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]], take II, the cache issue (duration: 01m 03s)
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 12:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}571412{{!}}Enable ContentTranslation out of beta in bs and mk WPs (T244139, T244140)]] (duration: 01m 15s)
* 12:08 vgutierrez: depool cp2013 and reimage as buster - [[phab:T242093|T242093]]
* 12:06 vgutierrez: pool cp2016 running buster - [[phab:T242093|T242093]]
* 12:01 vgutierrez: depool cp20[16,22] and reimage as buster - [[phab:T242093|T242093]]
* 11:57 vgutierrez: pool cp20[19,24] running buster - [[phab:T242093|T242093]]
* 11:53 akosiaris: mangle sessionstore on mw1331 so that it is unreachable. Testing for [[phab:T243106|T243106]]
* 11:49 vgutierrez: repooling cp40[26,32]
* 11:39 vgutierrez: pool cp3050 running buster - [[phab:T242093|T242093]]
* 11:37 vgutierrez: depooling cp[4026,4032]
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 vgutierrez: depool cp2024 and reimage as buster - [[phab:T242093|T242093]]
* 11:17 vgutierrez: pool cp2025 running buster - [[phab:T242093|T242093]]
* 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 vgutierrez: depool cp2016 and reimage as buster - [[phab:T242093|T242093]]
* 11:14 vgutierrez: pool cp2019 running buster - [[phab:T242093|T242093]]
* 11:11 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 11:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 vgutierrez: depool cp3050 and reimage as buster - [[phab:T242093|T242093]]
* 10:49 vgutierrez: pool cp30[51,52] running buster - [[phab:T242093|T242093]]
* 10:45 vgutierrez: depool cp20[19,25] and reimage as buster - [[phab:T242093|T242093]]
* 10:42 vgutierrez: pool cp2026 running buster - [[phab:T242093|T242093]]
* 10:36 vgutierrez: pool cp2023 running buster - [[phab:T242093|T242093]]
* 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:34 moritzm: bouncing ferm on ganeti1016, failed to start after boot
* 10:32 vgutierrez: Enable KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 10:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:12 vgutierrez: testing trafficserver 8.0.6-rc0 in cp40[26,32]
* 10:06 vgutierrez: depool cp20[23,26] and reimage as buster - [[phab:T242093|T242093]]
* 10:01 vgutierrez: depool cp30[51-52] and reimage as buster - [[phab:T242093|T242093]]
* 09:38 ema: cp: rolling ats-tls-restart to enable analytics logging [[phab:T237993|T237993]]
* 09:26 ema: cp4027: ats-tls-restart to enable analytics logging to pipe [[phab:T237993|T237993]]
* 09:25 moritzm: rolling restart of cassandra on restbase-dev to pick up Java security updates
* 09:17 marostegui: Failover m2 master dbproxy from dbproxy1007 to dbproxy1013 - [[phab:T202367|T202367]]
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:11 marostegui: Upgrade and reboot dbproxy1013 before making it master - [[phab:T202367|T202367]]
* 08:55 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@9bbbb58]: (no justification provided) (duration: 00m 05s)
* 08:46 phedenskog@deploy1001: Started deploy [performance/navtiming@9bbbb58]: (no justification provided)
* 08:38 marostegui: Restart wikibugs as it doesn't show phab comments on irc - [[phab:T241109|T241109]]
* 08:21 moritzm: installing mesa security updates
* 07:28 vgutierrez: pool cp30[53-54] running buster - [[phab:T242093|T242093]]
* 07:18 oblivian@puppetmaster1001: conftool action : set/weight=30; selector: dc=eqiad,pool=appserver,name=mw132[3-4].*
* 07:16 oblivian@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,pool=appserver,service=nginx,name=mw12[3-5].*
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 20 for  10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10391 and previous config saved to /var/cache/conftool/dbconfig/20200212-070250-marostegui.json
* 06:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:46 marostegui: Redact ngwikimedia on db1124:3313 and db2094:3313 [[phab:T240772|T240772]]
* 06:22 vgutierrez: depool cp30[53-54] and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 01:48 XioNoX: disabling peering session on cr1-eqsin (they're flapping otherwise)
* 00:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/page/ImageHistoryPseudoPager.php: [[phab:T244937|T244937]] ImageHistoryPseudoPager: Update doQuery() for IndexPager changes (duration: 01m 03s)
* 00:38 XioNoX: reboot cr1-eqsin
* 00:33 XioNoX: commit full on cr1-eqsin - [[phab:T243080|T243080]]
* 00:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: rm wgKartographerIconServer (duration: 01m 02s)
* 00:20 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: rm wgKartographerIconServer (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|ee9edf8137}} to {{Gerrit|55b2afb6eb}}, config revision is {{Gerrit|561ae21f77}}


== 2020-02-11 ==
== 2021-07-30 ==
* 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - [[phab:T243080|T243080]]
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - [[phab:T243080|T243080]]
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:45 cdanis: repool eqiad
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:36 bblack: re-pooling all cp10xx in eqiad
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:32 XioNoX: switchover RE mastership on cr1-eqsin - [[phab:T243080|T243080]]
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:14 robh: cp1067 powered back into service post firmware update via [[phab:T243167|T243167]]
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:11 cdanis: depool eqiad
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via [[phab:T243167|T243167]]
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 20:51 XioNoX: reboot backup RE on cr1-eqsin - [[phab:T243080|T243080]]
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 20:38 robh: depooling cp108[45] for firmware update via [[phab:T243167|T243167]]
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 02s)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 03s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 20:08 XioNoX: depool eqsin for router upgrade - [[phab:T243080|T243080]]
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:  (duration: 00m 04s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 ([[phab:T233867|T233867]])
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat ([[phab:T240771|T240771]])
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 18:43 twentyafterfour: getting ready to deploy wmf.18 refs  [[phab:T233866|T233866]]
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 18:42 greg-g: restarting stashbot
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - [[phab:T242017|T242017]]
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 18:33 Urbanecm: Create ngwikimedia is done ([[phab:T240771|T240771]])
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 03s)
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia ([[phab:T240771|T240771]])
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 17:57 bblack: reboot dns2002 post-reimaging
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 17:13 vgutierrez: Disable KA on cp4031 - [[phab:T244464|T244464]]
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 16:49 vgutierrez: pool cp3055 running buster - [[phab:T242093|T242093]]
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 16:43 vgutierrez: repooling cp4031
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 16:38 vgutierrez: depooling cp4031 for some KA tests
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 16:25 vgutierrez: pool cp3056 running buster - [[phab:T242093|T242093]]
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - [[phab:T242017|T242017]]
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 16:21 bblack: dns2002 - stopping bird adverts to depool service for [[phab:T242017|T242017]]
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 16:20 bblack: dns2002 - downtimed in icinga for [[phab:T242017|T242017]]
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:23 moritzm: installing libsndfile security updates on stretch
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 15:38 vgutierrez: depool cp3056 and reimage as buster - [[phab:T242093|T242093]]
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 15:36 vgutierrez: pool cp3058 running buster - [[phab:T242093|T242093]]
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - [[phab:T242122|T242122]] (duration: 01m 08s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 14:58 vgutierrez: depool cp3055 and reimage as buster - [[phab:T242093|T242093]]
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 14:56 vgutierrez: pool cp3057 running buster - [[phab:T242093|T242093]]
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 14:20 vgutierrez: restart varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - [[phab:T242093|T242093]]
* 13:52 vgutierrez: pool cp3059 and cp3060 running buster - [[phab:T242093|T242093]]
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:34 Amir1: EU SWAT is done
* 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]], take II, cache (duration: 01m 06s)
* 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]] (duration: 01m 05s)
* 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]], Second round, cache issue (duration: 01m 07s)
* 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]] (duration: 01m 11s)
* 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - [[phab:T242093|T242093]]
* 11:59 vgutierrez: repool cp3061 and cp3062 running buster - [[phab:T242093|T242093]]
* 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 vgutierrez: depool cp3062 and reimage as buster - [[phab:T242093|T242093]]
* 10:54 vgutierrez: repool cp3064 running buster - [[phab:T242093|T242093]]
* 10:51 vgutierrez: depool cp3061 and reimage as buster - [[phab:T242093|T242093]]
* 10:50 vgutierrez: repool cp5006 and cp3063 running buster - [[phab:T242093|T242093]]
* 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - [[phab:T242093|T242093]]
* 09:52 vgutierrez: depool cp5006 and reimage as buster - [[phab:T242093|T242093]]
* 09:52 vgutierrez: pool cp5007 running buster - [[phab:T242093|T242093]]
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
* 08:25 marostegui: Upgrade db1095:3312, db1095:3313
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
* 07:57 akosiaris: [[phab:T242705|T242705]] systemctl stop uwsgi-ores on ores2001.
* 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
* 07:47 marostegui: Upgrade es1013 - [[phab:T239791|T239791]]
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
* 07:23 vgutierrez: depool cp5007 and reimage as buster - [[phab:T242093|T242093]]
* 07:22 vgutierrez: pool cp5001 and cp5008 running buster - [[phab:T242093|T242093]]
* 07:21 marostegui: Remove partitions from db2086:3318 - [[phab:T239453|T239453]]
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
* 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:59 marostegui: Stop haproxy on dbproxy1001 - [[phab:T244463|T244463]]
* 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:48 marostegui: Remove grants in m1 for dbproxy1001 - [[phab:T231280|T231280]]
* 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - [[phab:T202367|T202367]]
* 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
* 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
* 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)


== 2020-02-10 ==
== 2021-07-29 ==
* 23:30 robh: cp108[23] returned to service via [[phab:T243167|T243167]]
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:28 legoktm: restarting zuul
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 04s)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 07s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via [[phab:T243167|T243167]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16 refs [[phab:T281157|T281157]]
* 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silent --id 2164
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via [[phab:T243167|T243167]]
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 21:43 arlolra: Updated Parsoid to {{Gerrit|612106d2}} ([[phab:T244412|T244412]], [[phab:T244413|T244413]], [[phab:T242746|T242746]], [[phab:T235273|T235273]], [[phab:T235307|T235307]], [[phab:T238845|T238845]], [[phab:T204618|T204618]], [[phab:T240054|T240054]])
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 21:38 robh: cp1075 & cp1076 offline for bios updates per [[phab:T243167|T243167]]
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:36 robh: cp1075 and cp1076 going offline for bios updates.  This will cause a bit of cp irc icinga noise, but no paging.  Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}} (duration: 10m 26s)
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}}
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]] (duration: 12m 18s)
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]]
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 11s)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 15s)
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570393]] Config: Session Store: Switch group0 and group1 to kask-session [[phab:T243106|T243106]] (duration: 01m 06s)
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group ([[phab:T244508|T244508]])
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242122|T242122]] Load new EventStreamConfig extension if so configured (duration: 01m 06s)
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242122|T242122]] Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 05s)
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs [[phab:T233867|T233867]]
* 14:11 vgutierrez: restart pybal on lvs2009
* 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs [[phab:T233866|T233866]]
* 14:09 vgutierrez: restart pybal on lvs2010
* 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244561|T244561]] Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
* 14:07 vgutierrez: restart pybal on lvs2008
* 16:48 moritzm: installing libexif security updates on jessie
* 14:05 vgutierrez: restart pybal on lvs2007
* 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - [[phab:T242093|T242093]]
* 13:59 vgutierrez: restart pybal on lvs1014
* 15:45 XioNoX: push outbound flowspec support to core routers
* 13:55 vgutierrez: restart pybal on lvs1015
* 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after first day of 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10366 and previous config saved to /var/cache/conftool/dbconfig/20200210-154552-marostegui.json
* 13:52 _joe_: restarting pybal on lvs1016
* 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 15:33 godog: roll restart cassandra on session* to apply logging changes - [[phab:T242585|T242585]]
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 15:23 moritzm: uploading debdeploy 0.0.99.13 to apt.wikimedia.org
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 15:22 godog: roll restart cassandra on restbase* to apply logging changes - [[phab:T242585|T242585]]
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 15:06 marostegui: Reload haproxy on dbproxy1017 and dbproxy1017 - [[phab:T244209|T244209]]
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 15:04 twentyafterfour@deploy1001: Finished scap: full scap sync prior to wmf.18 rollout (duration: 20m 13s)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 15:04 godog: roll restart cassandra on maps* to apply logging changes - [[phab:T242585|T242585]]
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 15:03 vgutierrez: rolling restart of ats-tls - [[phab:T240950|T240950]]
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 15:00 marostegui: Restart mysql on m5 master (wikitech will go down) - [[phab:T244209|T244209]]
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:52 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:46 vgutierrez: depool cp5002 and cp5009 and reimage as buster - [[phab:T242093|T242093]]
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 14:44 twentyafterfour@deploy1001: Started scap: full scap sync prior to wmf.18 rollout
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 14:42 vgutierrez: repool cp5003 and cp5010 running buster - [[phab:T242093|T242093]]
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 14:41 marostegui: Full-upgrade db1133 (without restarting mysql) - [[phab:T244209|T244209]]
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 14:40 twentyafterfour: MediaWiki Train: Running a full scap to prepare for moving forward to 1.35.0-wmf.18 ( [[phab:T233866|T233866]] )
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:32 marostegui: Downtime m5 hosts for the upcoming maintenance - [[phab:T244209|T244209]]
* 07:52 moritzm: restarting Tomcat on idp-test
* 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 14:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:11 XioNoX: remove TCP-MSS clamping on cr3-knams
* 13:48 vgutierrez: depool cp5003 and reimage as buster - [[phab:T242093|T242093]]
* 13:47 vgutierrez: pooling cp5004 with buster - [[phab:T242093|T242093]]
* 13:46 vgutierrez: depool cp5010 and reimage as buster - [[phab:T242093|T242093]]
* 13:45 vgutierrez: pooling cp5011 with buster - [[phab:T242093|T242093]]
* 13:28 godog: roll restart cassandra on aqs to apply logging changes - [[phab:T242585|T242585]]
* 13:03 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase: [[gerrit:570911{{!}}Revert "wbterms: Set default for the term store to read new"]] ([[phab:T244529|T244529]]) (duration: 01m 00s)
* 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 Urbanecm: EU SWAT is done
* 12:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 58s)
* 12:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|989c9f8}}: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 59s)
* 12:49 urbanecm@deploy1001: Finished scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]]) (duration: 20m 18s)
* 12:30 vgutierrez: depool cp5004 and reimage as buster - [[phab:T242093|T242093]]
* 12:29 vgutierrez: pooling cp5005 with buster - [[phab:T242093|T242093]]
* 12:28 urbanecm@deploy1001: Started scap: SWAT: {{Gerrit|799224f}}:  {{Gerrit|137a40e}} ([[phab:T241242|T241242]]; [[phab:T243974|T243974]])
* 12:23 vgutierrez: pooling ncredir1001 with buster - [[phab:T243391|T243391]]
* 12:18 _joe_: running puppet, scap pull on mwdebug1001
* 12:17 vgutierrez: upload trafficserver 8.0.5-1wm15 to apt.wm.o (buster) - [[phab:T244538|T244538]]
* 12:08 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:06 vgutierrez: testing ats 8.0.5-1-wm15 on cp4032 - [[phab:T244538|T244538]]
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|014405a}}: Add throttle rules for OSU Editathon and workshop for cawiki, remove expired ones ([[phab:T244608|T244608]], [[phab:T244645|T244645]]) (duration: 01m 03s)
* 11:57 vgutierrez: depool ncredir1001 and reimage as buster - [[phab:T243391|T243391]]
* 11:57 vgutierrez: pooling ncredir1002 with buster - [[phab:T243391|T243391]]
* 11:43 vgutierrez: pooling cp4027 with buster - [[phab:T242093|T242093]]
* 11:38 vgutierrez: depool ncredir1002 and reimage as buster - [[phab:T243391|T243391]]
* 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:22 vgutierrez: depooling cp5011 and cp5005 & reimage as buster - [[phab:T242093|T242093]]
* 11:07 vgutierrez: depool cp4027 & reimage as buster - [[phab:T242093|T242093]]
* 11:07 vgutierrez: pooling ncredir2001 with buster - [[phab:T243391|T243391]]
* 11:03 vgutierrez: pooling cp4028 with buster - [[phab:T242093|T242093]]
* 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:47 godog: remove old logs from /var/log/swift on swift hsots
* 10:31 vgutierrez: depool ncredir2001 and reimage as buster - [[phab:T243391|T243391]]
* 10:26 vgutierrez: depool cp4028 & reimage as buster - [[phab:T242093|T242093]]
* 10:14 moritzm: installing sudo security updates for buster
* 08:53 vgutierrez: pooling cp4029 with buster - [[phab:T242093|T242093]]
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 1 to 5 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10364 and previous config saved to /var/cache/conftool/dbconfig/20200210-084446-marostegui.json
* 08:43 vgutierrez: pooling ncredir2002 with buster - [[phab:T243391|T243391]]
* 08:34 effie: rolling restart php-fpm on labweb[1001-1002].wikimedia.org,mw*.eqiad.wmnet,scandium.eqiad.wmnet, wtp[1025-1048].eqiad.wmnet
* 08:32 effie: update php-apcu on eqiad - [[phab:T236800|T236800]]
* 08:29 effie: rolling restart php-fpm on cloudweb2001-dev.wikimedia.org,mw[2135-2147,2150-2212,2214-2290].codfw.wmnet,wtp[2001-2020].codfw.wmnet
* 08:23 effie: update php-apcu on codfw - [[phab:T236800|T236800]]
* 07:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:54 moritzm: updating d-i netinst image for Stretch 9.12 point release (which bumped the kernel ABI)
* 07:29 moritzm: updating d-i netinst image for Buster 10.3 point release (which bumped the kernel ABI)
* 07:09 elukey: restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Place db1107 - MariaDB 10.4 on s1 with minimal weight - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10363 and previous config saved to /var/cache/conftool/dbconfig/20200210-070140-marostegui.json
* 06:55 vgutierrez: depool ncredir2002 and reimage as buster - [[phab:T243391|T243391]]
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019', diff saved to https://phabricator.wikimedia.org/P10362 and previous config saved to /var/cache/conftool/dbconfig/20200210-065326-marostegui.json
* 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10361 and previous config saved to /var/cache/conftool/dbconfig/20200210-065135-marostegui.json
* 06:47 vgutierrez: depool cp4029 & reimage as buster - [[phab:T242093|T242093]]
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019', diff saved to https://phabricator.wikimedia.org/P10360 and previous config saved to /var/cache/conftool/dbconfig/20200210-064553-marostegui.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10359 and previous config saved to /var/cache/conftool/dbconfig/20200210-064458-marostegui.json
* 06:39 marostegui: Compress db1124:3318 - this will generate lag on s8 wiki replicas - [[phab:T232446|T232446]]
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10358 and previous config saved to /var/cache/conftool/dbconfig/20200210-063716-marostegui.json
* 06:23 marostegui: Remove partitions from db1099:3311, db1099:3318 [[phab:T239453|T239453]]
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool  db1099:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10357 and previous config saved to /var/cache/conftool/dbconfig/20200210-062112-marostegui.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10356 and previous config saved to /var/cache/conftool/dbconfig/20200210-061822-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10355 and previous config saved to /var/cache/conftool/dbconfig/20200210-061656-marostegui.json


== 2020-02-09 ==
== 2021-07-28 ==
* 05:11 cdanis: [[phab:T238305|T238305]] hardreset cp3051
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-02-08 ==
== 2021-07-27 ==
* 19:12 _joe_: set cpufreq governor to performance on mw1328
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 17:04 _joe_: restarted php7.2-fpm on mw1332
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 16:53 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 12.24.27.50
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 16:47 gjg@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Editathon in Charolette (duration: 00m 58s)
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 00:05 Jeff_Green: switched payments.wikimedia.org to codfw datacenter due to [[phab:T244610|T244610]]
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-02-07 ==
== 2021-07-26 ==
* 22:20 jeh: ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 20:47 mutante: OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs ([[phab:T224576|T224576]])
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 20:42 jeh: ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org [[phab:T240718|T240718]]
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 20:32 mutante: ganeti: attempting to reinstall install1003 which failed last time
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 17:36 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs [[phab:T233866|T233866]] (duration: 01m 03s)
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 17:32 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs [[phab:T233866|T233866]] (duration: 01m 02s)
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 17:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs [[phab:T233866|T233866]]
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 17:19 marostegui: Start MySQL on es1019 after onsite maintenance [[phab:T243963|T243963]]
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 16:38 filippo@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 16:13 XioNoX: remove MSS clamping from eqiad/eqord/knams/esams
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:05 andrew@deploy1001: Finished deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]] (duration: 03m 45s)
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:04 vgutierrez: pooling cp4030 with buster - [[phab:T242093|T242093]]
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 16:03 bblack: removing GRE MTU mitigations from cp[135]xxx - [[phab:T232602|T232602]]
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 16:01 andrew@deploy1001: Started deploy [horizon/deploy@bc777d6]: Fix for [[phab:T243422|T243422]]
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 15:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 15:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 15:25 vgutierrez: depool & reimage cp4030 as buster - [[phab:T242093|T242093]]
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 15:21 vgutierrez: pooling cp4031 with buster - [[phab:T242093|T242093]]
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 15:20 vgutierrez: pooling ncredir3001 running buster - [[phab:T243391|T243391]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 15:18 marostegui: Restart all instances on db1124 and db1125 to pick up a new replication filter - [[phab:T240094|T240094]]
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:11 marostegui: Restart all instances on db2094 and db2095 to pick up a new replication filter - [[phab:T240094|T240094]]
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:43 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 40s)
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop ([[phab:T244578|T244578]])
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 14:40 hoo@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 14:38 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo ([[phab:T244529|T244529]]) (duration: 01m 20s)
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 14:33 vgutierrez: depool and reimage ncredir3001 as buster - [[phab:T243391|T243391]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 14:32 vgutierrez: depool & reimage cp4031 as buster - [[phab:T242093|T242093]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 14:23 vgutierrez: pooling ncredir3002 running buster - [[phab:T243391|T243391]]
* 06:39 moritzm: installing krb5 security updates
* 13:26 vgutierrez: pooling cp4021 with buster - [[phab:T242093|T242093]]
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 13:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:51 vgutierrez: depool and reimage ncredir3002 as buster - [[phab:T243391|T243391]]
* 12:42 vgutierrez: depool & reimage cp4021 as buster - [[phab:T242093|T242093]]
* 12:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:57 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 11:25 vgutierrez: pooling ncredir5001 running buster - [[phab:T243391|T243391]]
* 11:24 vgutierrez: pooling cp4022 with buster - [[phab:T242093|T242093]]
* 11:09 akosiaris: undo wikifeeds experiments
* 11:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:42 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:36 akosiaris: conduct experiments with stopping/starting uwsgi-ores on ores2001 [[phab:T242705|T242705]]
* 10:24 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:23 vgutierrez: depool and reimage ncredir5001 as buster - [[phab:T243391|T243391]]
* 10:14 vgutierrez: depool & reimage cp4022 as buster - [[phab:T242093|T242093]]
* 10:02 akosiaris: increase capacity for wikifeeds by 50% [[phab:T244535|T244535]]
* 10:02 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 10:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:53 ema: A:mw: increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 09:09 godog: roll restart cassandra instance on restbase-dev
* 09:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 09:03 godog: restart cassandra on restbase-dev1004 to test logging pipeline onboard
* 09:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 08:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P10343 and previous config saved to /var/cache/conftool/dbconfig/20200207-085846-marostegui.json
* 08:54 marostegui: Upgrade db1090:3312, db1090:3317
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10342 and previous config saved to /var/cache/conftool/dbconfig/20200207-085432-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10341 and previous config saved to /var/cache/conftool/dbconfig/20200207-084447-marostegui.json
* 08:44 moritzm: installing libexif security updates
* 08:21 akosiaris: deploy https://gerrit.wikimedia.org/r/570726 [[phab:T244535|T244535]] to avoid CPU throttling of wikifeeds
* 08:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Increase base weight for db1126', diff saved to https://phabricator.wikimedia.org/P10340 and previous config saved to /var/cache/conftool/dbconfig/20200207-075323-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10339 and previous config saved to /var/cache/conftool/dbconfig/20200207-075234-marostegui.json
* 07:48 marostegui: Remove revision partitions from db2085:3318 [[phab:T239453|T239453]]
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fullyy repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10338 and previous config saved to /var/cache/conftool/dbconfig/20200207-074511-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10337 and previous config saved to /var/cache/conftool/dbconfig/20200207-074407-marostegui.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10336 and previous config saved to /var/cache/conftool/dbconfig/20200207-074258-marostegui.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10335 and previous config saved to /var/cache/conftool/dbconfig/20200207-073130-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10334 and previous config saved to /var/cache/conftool/dbconfig/20200207-073026-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10333 and previous config saved to /var/cache/conftool/dbconfig/20200207-063831-marostegui.json
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10332 and previous config saved to /var/cache/conftool/dbconfig/20200207-063402-marostegui.json
* 06:31 elukey: force a puppet run on all ores[12] nodes
* 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10331 and previous config saved to /var/cache/conftool/dbconfig/20200207-062731-marostegui.json
* 06:26 marostegui: Reboot db1107 for update - [[phab:T242702|T242702]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10330 and previous config saved to /var/cache/conftool/dbconfig/20200207-062502-marostegui.json
* 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10329 and previous config saved to /var/cache/conftool/dbconfig/20200207-062345-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10328 and previous config saved to /var/cache/conftool/dbconfig/20200207-062043-marostegui.json
* 04:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 04:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 04:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:24 robh: eqsin pdu work ongoing starting now.  ps1-603 swapping per [[phab:T242250|T242250]]
* 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-06 ==
== 2021-07-24 ==
* 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244133|T244133]] [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
* 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T159711|T159711]] [[phab:T161365|T161365]] [[phab:T164435|T164435]] [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
* 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Don't trying to assign  to  if it's unset (duration: 01m 07s)
* 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: [[phab:T242184|T242184]] Change tags method so anon edits will go through (duration: 01m 08s)
* 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is ([[phab:T242606|T242606]])
* 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs [[phab:T233866|T233866]]
* 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 20:52 akosiaris: restart all wikifeeds pods
* 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 20:45 akosiaris: restart restbase on restbase1027
* 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 20:30 twentyafterfour: sync-wikiversions --force
* 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
* 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244405|T244405]] Set wgLogoHD before adding wordmark (duration: 01m 06s)
* 19:36 bblack: re-pool cp1075 (eqiad text)
* 19:33 addshore: SWAT done!
* 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: [[phab:T244479|T244479]] Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
* 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
* 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: [[phab:T243713|T243713]] Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
* 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
* 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
* 19:23 cdanis: manual puppet run on netflow1001 looked good; ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "run-puppet-agent --enable 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
* 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]], sync again for luck (duration: 01m 06s)
* 19:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin A:netflow "disable-puppet 'rollout of {{Gerrit|I60692f0e8}} [[phab:T237587|T237587]] cdanis'"
* 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere [[phab:T243395|T243395]] (duration: 01m 07s)
* 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 10s)
* 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
* 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
* 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}} (duration: 06m 27s)
* 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to {{Gerrit|ceeb950}}
* 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:32 herron: set performance cpu scaling governor on maps*
* 16:49 vgutierrez: pooling ncredir5002 running buster - [[phab:T243391|T243391]]
* 16:38 vgutierrez: pooling cp4023 with buster - [[phab:T242093|T242093]]
* 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
* 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
* 16:35 XioNoX: remove AS prepending in esams/knams
* 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
* 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
* 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:07 vgutierrez: depool and reimage ncredir5002 as buster - [[phab:T243391|T243391]]
* 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 16:03 vgutierrez: depool & reimage cp4023 as buster - [[phab:T242093|T242093]]
* 16:03 vgutierrez: pooling cp4024 with buster - [[phab:T242093|T242093]]
* 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
* 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
* 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]] (duration: 04m 35s)
* 15:56 vgutierrez: pooling ncredir4001 running buster - [[phab:T243391|T243391]]
* 15:55 moritzm: installing qemu security updates
* 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T242705|T242705]]
* 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - [[phab:T180069|T180069]]
* 15:50 moritzm: installing python-ecdsa security updates
* 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:41 moritzm: installing jsoup security updates
* 15:30 vgutierrez: depool & reimage ncredir4001 as buster - [[phab:T243391|T243391]]
* 15:29 vgutierrez: depool & reimage cp4024 as buster - [[phab:T242093|T242093]]
* 15:28 vgutierrez: pooling ncredir4002 running buster - [[phab:T243391|T243391]]
* 15:27 moritzm: installing sudo security updates on jessie
* 15:23 vgutierrez: pooling cp4025 with buster - [[phab:T242093|T242093]]
* 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ [[phab:T241145|T241145]]
* 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 godog: extend graphite1004 / graphite2003 fs +200G
* 14:56 vgutierrez: depool and reimage ncredir4002 as buster - [[phab:T243391|T243391]]
* 14:46 vgutierrez: depool & reimage cp4025 as buster - [[phab:T242093|T242093]]
* 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
* 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump [[phab:T241145|T241145]]
* 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - [[phab:T243963|T243963]]
* 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
* 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
* 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
* 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
* 13:31 XioNoX: reboot cr3-knams
* 13:31 elukey: depool mw1347 to test some mcrouter settings
* 13:27 XioNoX: deactivate BGP transits on cr3-knams
* 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - [[phab:T244464|T244464]]
* 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
* 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
* 13:00 addshore: SWAT done
* 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
* 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
* 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]], due to [[phab:T244479|T244479]] (duration: 01m 07s)
* 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 [[phab:T243395|T243395]] (duration: 01m 06s)
* 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 07s)
* 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API ([[phab:T243726|T243726]]) (duration: 01m 04s)
* 12:40 vgutierrez: pooling cp3065 - [[phab:T242093|T242093]]
* 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 [[phab:T243395|T243395]] (duration: 01m 07s)
* 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
* 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
* 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
* 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5e1cbb2}}: Enable CX in te, kn, gu, mr and pawiki as a default tool ([[phab:T243271|T243271]], [[phab:T243272|T243272]], [[phab:T243273|T243273]], [[phab:T243274|T243274]], [[phab:T243275|T243275]]) (duration: 01m 09s)
* 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
* 11:38 kart_: Updated cxserver to 2020-02-05-051751-production ([[phab:T244230|T244230]], [[phab:T234323|T234323]])
* 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
* 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
* 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
* 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - [[phab:T244464|T244464]]
* 10:00 vgutierrez: depool and reimage cp3065 as buster - [[phab:T242093|T242093]]
* 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - [[phab:T242093|T242093]]
* 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} (duration: 11m 41s)
* 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}}
* 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet (duration: 00m 29s)
* 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui {{Gerrit|5a1af3b}} to wdqs1010.eqiad.wmnet
* 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
* 08:18 dcausse: restarting blazegraph on wdqs1006: [[phab:T242453|T242453]]
* 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
* 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
* 06:46 elukey: run puppet on all ores[12]* nodes
* 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private ([[phab:T244390|T244390]])
* 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm


== 2020-02-05 ==
== 2021-07-23 ==
* 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}} (duration: 10m 48s)
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to {{Gerrit|a7928fa}}
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) ([[phab:T244389|T244389]])
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}} (duration: 03m 07s)
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to {{Gerrit|74730a3}}
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
* 16:15 effie: enable puppet on mc-gp* hosts
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 20:50 mutante: ores1004 - systemctl start celery-ores-worker
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 01m 07s)
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:25 mutante: mw1267 restarting php7.2-fpm
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs [[phab:T233866|T233866]]
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 20:09 moritzm: installing git security updates for jessie
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 20:00 moritzm: installing unzip security updates
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 19:44 mutante: LDAP - added spramduya to wmf group ([[phab:T243802|T243802]])
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 19:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 19:38 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238029|T238029]] Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 19:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync back revert of {{Gerrit|975b4bbb9}} (duration: 01m 06s)
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 19:10 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 18:35 vgutierrez: pooling cp5012 - [[phab:T242093|T242093]]
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 18:23 vgutierrez: rebooting cp5012 - [[phab:T242093|T242093]]
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 18:21 elukey: restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 17:51 mutante: ganeti1017 - rebooting (not in use yet)
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 17:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/languages/: [[phab:T244300|T244300]] (duration: 01m 13s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 17:33 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/includes/: [[phab:T244300|T244300]] (duration: 01m 14s)
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 16:53 urandom: Sessionstore deployment (mediawiki-config) is done
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 16:37 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569678]] Config: Enable sessionstore on group0 and 1 [[phab:T243106|T243106]] (duration: 01m 08s)
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 16:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232140|T232140]] Restore wgLogoHD to wikis without a MinervaCustomLogos defined (duration: 01m 09s)
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 16:07 elukey: update puppet compiler's facts
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 15:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 15:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 15:29 effie: restart php-fpm on canaries - [[phab:T236800|T236800]]
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 15:24 effie: Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - [[phab:T236800|T236800]]
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 15:15 vgutierrez: depooling & reimaging cp5012 as buster - [[phab:T242093|T242093]]
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 15:12 ema: cp: unset Accept-Encoding from ats-be requests to applayer [[phab:T242478|T242478]]
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 14:35 vgutierrez: updating acme-chief to version 0.24 - [[phab:T244236|T244236]]
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 14:32 _joe_: restarting mcrouter at nice -19 on mw1331 for testing effects of that change
* 14:30 vgutierrez: upload acme-chief 0.24 to apt.wm.o (buster) - [[phab:T244236|T244236]]
* 14:26 XioNoX: push inital flowspec config to all routers
* 14:23 vgutierrez: pooling cp5006 - [[phab:T242093|T242093]]
* 14:13 ema: cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues [[phab:T242478|T242478]]
* 13:46 marostegui: Decrease buffer pool size on db1107 for testing - [[phab:T242702|T242702]]
* 13:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:42 akosiaris: undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm
* 13:41 ema: cp1075: unset Accept-Encoding on origin server requests [[phab:T242478|T242478]]
* 13:39 Amir1: EU SWAT is done
* 13:38 ema: cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ [[phab:T242478|T242478]]
* 13:35 XioNoX: rollback traffic steering off cr2-eqord
* 13:29 akosiaris: manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency
* 13:25 XioNoX: reboot cr2-eqord for software upgrade - yaaaaa
* 13:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301{{!}}Cache PropertyInfoLookup internally]] ([[phab:T243955|T243955]]) (duration: 01m 07s)
* 13:17 XioNoX: increase ospf cost for cr2-eqord links
* 13:16 vgutierrez: upload acme-chief 0.23 to apt.wm.o (buster) - [[phab:T244236|T244236]]
* 13:15 XioNoX: disable transit/peering BGP sessions on cr2-eqord
* 13:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: [[gerrit:570301{{!}}Cache PropertyInfoLookup internally]] ([[phab:T243955|T243955]]) (duration: 01m 07s)
* 13:10 XioNoX: rollback: disable transit/peering BGP sessions on cr2-eqdfw
* 13:08 vgutierrez: depooling & reimaging cp5006 as buster - [[phab:T242093|T242093]]
* 13:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5cc2b70}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 06s)
* 13:01 XioNoX: reboot cr2-eqdfw for software upgrade
* 13:00 Amir1: SWAT needs more time
* 12:55 XioNoX: disable transit/peering BGP sessions on cr2-eqdfw
* 12:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: {{Gerrit|d450288}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 07s)
* 12:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5cc2b70}}: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos ([[phab:T232140|T232140]]) (duration: 01m 07s)
* 12:32 awight@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Cite: SWAT: [[gerrit:570285{{!}}Revert follow standardization (T240858)]] (duration: 01m 13s)
* 10:53 akosiaris: rolling restart of all pods on kubernetes staging cluster to make sure everything is fine after the upgrade
* 10:50 akosiaris: [[phab:T244335|T244335]] upgrade kubernetes-node on kubestage1002.eqiad.wmnet to 1.13.12
* 10:43 ema: cp4028: varnish-frontend-restart [[phab:T243634|T243634]]
* 10:24 akosiaris: [[phab:T244335|T244335]] upgrade kubernetes-master on neon.eqiad.wmnet (staging)
* 10:24 effie: Upload php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 - [[phab:T236800|T236800]]
* 10:10 Urbanecm: Run mwscript deleteEqualMessages.php --delete to delete GrowthExperiments' message overrides (cswiki, viwiki, arwiki, kowiki)
* 09:57 akosiaris: upload kubernetes 1.13.12 to apt.wikimedia.org stretch-wikimedia/main [[phab:T244335|T244335]]
* 09:51 effie: install libmemcached-tools on mc-gp* servers - [[phab:T240684|T240684]]
* 09:05 ema: add individual FortiGate IPs hitting ulsfo (currently cp4028) to vcl blocked_nets -- trying to identify problematic traffic [[phab:T243634|T243634]]
* 07:02 marostegui: Replay s1 traffic on db1107 (10.4) [[phab:T242702|T242702]]
* 06:32 elukey: force a puppet run on ores* hosts
* 06:12 marostegui: Remove partitions from revision table db1098:3317 - [[phab:T239453|T239453]]
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10312 and previous config saved to /var/cache/conftool/dbconfig/20200205-060942-marostegui.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311, db2086:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10311 and previous config saved to /var/cache/conftool/dbconfig/20200205-060911-marostegui.json
* 02:38 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕤🍺 sudo varnish-frontend-restart


== 2020-02-04 ==
== 2021-07-22 ==
* 22:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 22:13 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]] (duration: 32m 03s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 22:03 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 21:41 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.18  refs [[phab:T233866|T233866]]
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 21:29 twentyafterfour: preparing the new mediawiki branch for deployment to test wikis
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 20:31 shdubsh: restart kartotherian on maps2001
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 20:24 shdubsh: temporarily enable access logs on maps2001
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 20:20 twentyafterfour: branching mediawiki to wmf/1.35.0-wmf.18 from commit {{Gerrit|054dd94e97d6}} - train blockers should be added as subtasks under [[phab:T233866|T233866]]
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 20:06 marxarelli: temporarily holding 1.35.0-wmf.18 [[[phab:T233866|T233866]]] branch cut and train due to concurrent maps prod issues
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 19:15 mutante: cp3065 - powercycling
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:45 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 17:57 cdanis: ✔️ cdanis@mw1272.eqiad.wmnet ~ 🕐☕ sudo restart-php7.2-fpm
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 17:41 akosiaris: reenable kartotherian on maps100*
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 17:34 oblivian@cumin1001: conftool action : set/weight=15; selector: cluster=appserver,service=nginx,dc=eqiad,name=mw12[3-5].*
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 17:13 _joe_: restarting php-fpm on mw126[1-3]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 17:11 _joe_: restarting php-fpm on mw1266-9
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 17:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/includes/filerepo/file/ForeignDBFile.php: gerrit: 570089, ongoing incident (duration: 01m 04s)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 17:07 _joe_: restarted php-fpm on mw1265 witrh 80 workers (teh default)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 17:07 _joe_: restarted php-fpm on mw1264 witrh 240 workers
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 16:52 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase: fix for the recent outage (duration: 01m 21s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:02 ema: cp: rolling ats-backend-restart to unset Accept-Encoding before sending origin server requests [[phab:T242478|T242478]]
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 14:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 14:18 akosiaris: deploy new wikifeeds chart that is consistent with the current scaffolding approach. No code deploy though.
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 14:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 14:07 XioNoX: repool ulsfo
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 14:00 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:36 XioNoX: restart cr3-ulsfo for software upgrade
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 13:23 vgutierrez: upgrading acme-chief to version 0.22 - [[phab:T240614|T240614]]
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 13:10 vgutierrez: uploaded acme-chief 0.22 to apt.wm.o (buster) - [[phab:T240614|T240614]]
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 13:09 XioNoX: restart cr4-ulsfo for upgrade
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 12:49 XioNoX: depool ulsfo for routers upgrade
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 10:35 ema: cp4032: varnish-frontend-restart [[phab:T243634|T243634]]
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 09:08 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - [[phab:T243948|T243948]]
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 09:07 marostegui: Upgrade s3 codfw master db2105 - [[phab:T239791|T239791]]
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 08:56 marostegui: Deploy schema change on enwiki eqiad host by host - [[phab:T243804|T243804]]
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 08:46 marostegui: Deploy schema change on enwiki codfw - [[phab:T243804|T243804]]
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 08:16 marostegui: Deploy schema change on testwiki - [[phab:T243804|T243804]]
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 08:13 marostegui: Deploy schema change on test2wiki - [[phab:T243804|T243804]]
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 07:36 marostegui: Upgrade Mariadb on db1107 from 10.4.11 to 10.4.12 [[phab:T242702|T242702]]
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 07:15 marostegui: Compress db1126 - [[phab:T232446|T232446]]
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10302 and previous config saved to /var/cache/conftool/dbconfig/20200204-071420-marostegui.json
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 07:09 marostegui: Compress db1091 - [[phab:T232446|T232446]]
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10301 and previous config saved to /var/cache/conftool/dbconfig/20200204-070804-marostegui.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db2086:3317 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10300 and previous config saved to /var/cache/conftool/dbconfig/20200204-070533-marostegui.json
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 06:48 elukey: force a puppet run on all ores[12] nodes
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 00:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [enwiki] Add Commons as an import source [[phab:T242884|T242884]] (duration: 00m 57s)
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 00:09 mutante: gerrit1002 - replaced ens5 with ens6 in /etc/network/interfaces (IP and row had changed in the past, needed manual fix after reboot and now came back) ;  mkfs.ext4 /dev/vdb on new additional 10GB disk. ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 00:06 jforrester@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [nlwiki] Enable VisualEditor by default for all users [[phab:T161365|T161365]] (duration: 00m 58s)
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 00:05 mutante: gerrit1002 - attempt to manually fix /etc/network interfaces , add IP on interface, reboot
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 00:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure remainder of testwikis group for kask-session [[phab:T243106|T243106]] (duration: 00m 58s)
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 00:02 volans: depool, varnish-frontend-restart, pool on cp4029 (~242k fds) - [[phab:T243634|T243634]]
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-02-03 ==
== 2021-07-21 ==
* 23:34 mutante: rebooting gerrit1002 (test VM)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 23:26 mutante: ganeti1003 - sudo gnt-instance modify --disk add:size=10G gerrit1002.wikimedia.org ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 23:21 mutante: gerrit1002 - deleting gerrit.log and gerrit.json files from January to free about 4GB of space ([[phab:T239151|T239151]] [[phab:T243983|T243983]])
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:12 XioNoX: removing AS15542 from esams
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:18 andrew@deploy1001: Finished deploy [horizon/deploy@8bffc7d]: Fix for [[phab:T243355|T243355]] (duration: 03m 29s)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:14 andrew@deploy1001: Started deploy [horizon/deploy@8bffc7d]: Fix for [[phab:T243355|T243355]]
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:13 mutante: rebooting ganeti1010, ganeti1011 and other new ganeti machines to pickup microcode mitigations, for some reason the previous reboots did not do it. rescheduled service check on icinga for ganeti1010 and now it recovered ([[phab:T228924|T228924]])
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:05 mutante: ganeti1010 - rebooting host to clear microcode mitigations CPU alert
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.35.0-wmf.15"
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:33 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:28 brennen@deploy1001: Synchronized php-1.35.0-wmf.16/includes/TemplateParser.php: Syncing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569643 for [[phab:T243548|T243548]] (duration: 01m 08s)
* 20:27 dancy: testing upcoming Scap release on beta
* 21:14 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: [[phab:T243451|T243451]] (duration: 12m 47s)
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 21:01 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: [[phab:T243451|T243451]]
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 20:43 mutante: doc1001 - sudo chown -R doc-uploader:doc-uploader /srv/docroot/
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 20:19 XioNoX: reactivate L3 only LB in esams/knams
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 20:19 XioNoX: remove test flowspec rule from cr3-knams
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 20:13 mutante: doc1001 - re-enabled puppet after merging gerrit:569620 - Git::Clone[integration/docroot]/File[/srv/docroot]/mode: mode changed '2775' to '0755' - Profile::Doc/File[/srv/docroot/org/wikimedia/doc]/group: group changed 'doc-uploader' to 'wikidev', mode changed '0775' to '0755'. needs another follow-up ([[phab:T237707|T237707]])
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 19:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [officewiki] Enable VisualEditor desktop section editing (duration: 01m 07s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 19:21 Urbanecm: Morning SWAT done
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 19:20 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: {{Gerrit|7b53a52}}: Add gcr, mnw and szy to InterwikiSortOrders (duration: 01m 11s)
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 19:19 mutante: doc1001 - chown -R doc-uploader:doc-uploader /srv/docroot ; temp. disabled puppet ([[phab:T237707|T237707]])
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|7bb6a12}}: Configure remainder of testwikis group for kask-transition ([[phab:T243106|T243106]]) (duration: 01m 14s)
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 18:58 mutante: < bblack> !log doc1001: chown -R nobody:wikidev /srv/docroot {{!}} < mutante> !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc  {{!}} https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304 {{!}} ([[phab:T237707|T237707]])
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 18:44 bblack: doc1001: chown -R nobody:wikidev /srv/docroot
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 18:34 brennen: edited /srv/mediawiki-stating/wikiversions.json on deploy1001; scap pull and scap wikiversions-compile on mwdebug1002; revert wikiversions changes on deploy1001.
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 18:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 18:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 18:17 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 16:52 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 16:48 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 16:38 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:38 XioNoX: rollback: add debug on eqiad-knams link interfaces - [[phab:T240659|T240659]]
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:33 XioNoX: add debug on eqiad-knams link interfaces - [[phab:T240659|T240659]]
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 14:59 moritzm: restarting exim on phab* to pick up libidn security update
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:55 moritzm: restarting superset on an-tool1004/1005 to pick up libidn security update
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:44 moritzm: restarting apache on an-tool*. cloudmetrics*, logstash*, grafana1002 to pick up libidn security update
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:21 moritzm: restarting slapd on ldap-corp* to pick up libidn2 security updates
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:18 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4031.ulsfo.wmnet ~ 🕤☕ sudo varnish-frontend-restart
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 13:58 moritzm: installing libidn2 security updates
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:31 moritzm: rebooting ganeti1009 - ganeti1022 to pick up microcode update [[phab:T228924|T228924]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 12:58 XioNoX: deactivate v6 BGP to AS25596
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 12:57 moritzm: installing spamassassin security updates
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 12:53 Urbanecm: Previous message should be "EU SWAT done"
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 12:52 Urbanecm: Morning SWAT done
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 12:52 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki*.png ([[phab:T243509|T243509]])
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 12:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|af0b745}}: Update logo for zh_classical wiki ([[phab:T243509|T243509]]) (duration: 01m 06s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 12:45 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: {{Gerrit|e9387b2}}: Disable MobileFrontend Mainpage special casing on frwiktionary ([[phab:T241888|T241888]]) (duration: 01m 05s)
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|5f13c19}}: Add minerva custom log for la.wiki ([[phab:T240728|T240728]]; 2/2) (duration: 01m 06s)
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 12:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|5f13c19}}: Add minerva custom log for la.wiki ([[phab:T240728|T240728]]; 1/2) (duration: 01m 06s)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 12:35 moritzm: installing openjpeg2 security updates
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:32 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg ([[phab:T233104|T233104]])
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|76e67cd}}: {{Gerrit|e266e25}}: Add wordmarks for szlwiki and etwiki ([[phab:T233104|T233104]], [[phab:T230379|T230379]]) (duration: 01m 06s)
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:29 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: {{Gerrit|76e67cd}}: {{Gerrit|e266e25}}: Add static wordmarks for szlwiki and etwiki ([[phab:T233104|T233104]], [[phab:T230379|T230379]]) (duration: 01m 06s)
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 12:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|32e0356}}: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains ([[phab:T243118|T243118]]) (duration: 01m 07s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6c48af8}}: Assign editautopatrolprotected to hewiki patrollers ([[phab:T243665|T243665]]) (duration: 01m 06s)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|6b497e7}}: Wikidata - enable TaintedRefs ([[phab:T241989|T241989]]) (duration: 01m 06s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0c0ef87}}: Add wgImportSources for hiwikibooks ([[phab:T244022|T244022]]) (duration: 01m 05s)
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Remove $wgImgAuthDetails=true ([[phab:T153459|T153459]]) (duration: 01m 36s)
* 10:14 effie: enable puppet on mw* servers
* 11:38 ema: powercycle cp3057 [[phab:T244127|T244127]] [[phab:T238305|T238305]]
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 10:24 godog: temp disable puppet on cp hosts as precaution for https://gerrit.wikimedia.org/r/c/operations/puppet/+/563977
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 10:08 moritzm: installing sudo security updates on stretch
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-02-02 ==
== 2021-07-20 ==
* 19:25 effie: restart varnish on cp4028
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 08:48 effie: reboot host analytics1061 - [[phab:T244081|T244081]]
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:06 rzl: enabled puppet on A:mw
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-02-01 ==
== 2021-07-19 ==
* 18:17 effie: pool scb2003, no need for host to stay depooled - [[phab:T244069|T244069]]
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 17:46 cdanis: [[phab:T243634|T243634]] ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕐☕ sudo varnish-frontend-restart
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 17:27 effie: depool scb2003 [[phab:T244069|T244069]]
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:51 effie: pool mw1273
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 16:50 effie: pool scb2003
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 16:30 elukey: powerup analytics1073 (attempt to see if it was only a kernel-related crash) - [[phab:T244064|T244064]]
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 16:16 effie: poweroff analytics1073 - [[phab:T244064|T244064]]
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 16:16 effie: poweroff analytics1073 - /T244064
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 16:16 effie: poweroff analytics1073
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 13:00 effie: depool scb2003
* 18:46 brennen: gerrit1001: restarting gerrit
* 12:21 effie: depool mw1273
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 01:03 eileen: process-control config revision is {{Gerrit|c3c8bde761}}
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 00:50 eileen: civicrm revision changed from {{Gerrit|fcc5673ee7}} to {{Gerrit|ee9edf8137}}, config revision is {{Gerrit|2a61da0ace}}
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2020-01-31 ==
== 2021-07-16 ==
* 22:25 eileen: civicrm revision changed from {{Gerrit|ac730a6bcb}} to {{Gerrit|fcc5673ee7}}, config revision is {{Gerrit|2a61da0ace}}
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:14 bstorm_: repooled labsdb1011 now that view work is done
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 22:00 eileen: process-control config revision is {{Gerrit|2a61da0ace}} disabled process-control
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:59 bstorm_: depooled labsdb1011
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:32 bstorm_: updated views on labsdb1010
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:22 bstorm_: updated views on labsdb1009
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 21:21 bstorm_: updated actor views on labsdb1012
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 18:17 bblack: repool cp4032 (buster)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 18:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 18:14 bblack: repool cp4029
* 15:48 vgutierrez: restart pybal on lvs2010
* 18:13 bblack: restarted ats-tls and varnish-fe on cp4029
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:05 bblack: depool varnish-fe on cp4029
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 18:03 bblack: depool ats-tls on cp4029
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 16:59 marostegui: Re-enable notifications on the dbstore1005:3318 check [[phab:T243871|T243871]]
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 09:18 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --sleep 4 --batch-size=25 # In a screen for [[phab:T219301|T219301]]
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 03:22 mutante: powercycling crashed cp3063
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 01:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@322ee4c]: Update mobileapps to {{Gerrit|3eec28d}} (duration: 06m 53s)
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 01:02 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@322ee4c]: Update mobileapps to {{Gerrit|3eec28d}}
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 00:41 mutante: contint1001/contint2001 - upgrading jenkins to 2.219
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 00:36 mutante: releases2001: upgrading jenkins to 2.219; install1002: import jenkins 2.219 into jessie-wikimedia APT repo
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 00:31 mutante: importing jenkins 2.219 to stretch-wikimedia APT repo; releases1001: upgrading jenkins to 2.219
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)


== 2020-01-30 ==
== 2021-07-15 ==
* 19:37 mutante: copying /var/log/apache2 to /root on all eqiad mw appservers to preserve logs
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 18:07 vgutierrez: depool cp4032 and perform a rolling restart of varnish-fe at cp4027-cp4031 - [[phab:T243634|T243634]]
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 17:51 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/Sql/Terms/FingerprintableEntityTermStoreTrait.php: wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds ([[phab:T243944|T243944]]) (duration: 01m 06s)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 17:49 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/repo/maintenance/rebuildItemTerms.php: wbterms: Write only to the new term store in rebuildItemTerms ([[phab:T243944|T243944]]) (duration: 01m 09s)
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 17:03 vgutierrez: repooling cp4032 - [[phab:T243634|T243634]]
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 17:02 vgutierrez: restarting varnish-frontend on cp4031 before it crashes - [[phab:T243634|T243634]]
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 16:26 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - [[phab:T243948|T243948]]
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 12:22 arturo: add prometheus 2.7.1+ds-3+k8s+buster to buster-wikimedia [[phab:T238096|T238096]] (basically a rebuild from stretch)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 06:23 vgutierrez: restarting varnish-frontend on cp4030 before it crashes - [[phab:T243634|T243634]]
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 06:21 vgutierrez: depool cp4032 - [[phab:T243634|T243634]]
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 05:12 vgutierrez: restarting varnish-frontend and repooling cp4029 - [[phab:T243634|T243634]]
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 05:00 vgutierrez: depooling cp4029
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 11:34 moritzm: installing libuv1 security updates
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:02 effie: disableing puppet on maps* for 704394
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== 2020-01-29 ==
== 2021-07-14 ==
* 23:37 marostegui: Remove partitions from db2087:3317 - [[phab:T239453|T239453]]
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 18:17 XioNoX: move knams netflow sampling to cr3-knams
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 17:19 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ice8dad2}} (duration: 01m 10s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 01:11 vgutierrez: varnish-frontend restarted on cp4031
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 01:09 vgutierrez: repool cp4031
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 01:05 marostegui: Disable notifications for dbstore1005:3318 slave lag - [[phab:T243871|T243871]]
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 01:03 vgutierrez: depool cp4031
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10289 and previous config saved to /var/cache/conftool/dbconfig/20200129-003507-marostegui.json
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 00:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10288 and previous config saved to /var/cache/conftool/dbconfig/20200129-002203-marostegui.json
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:37 moritzm: installing klibc security updates
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
* 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
* 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:28 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
* 14:51 moritzm: installing apache security updates on puppet masters
* 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
* 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - [[phab:T286463|T286463]]
* 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 14:44 moritzm: installing apache security updates on grafana*
* 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
* 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
* 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
* 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:13 elukey: restart php-fpm on mw2370
* 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277118|T277118]]
* 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
* 12:43 urbanecm: Start server-side upload of 3 large image files ([[phab:T285708|T285708]])
* 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
* 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 12:15 mutante: mw1422 - scap pull
* 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
* 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
* 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
* 11:52 mutante: mw1422 - new setup, not in prod yet
* 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
* 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704525{{!}}Remove reviewer user group in ruwiki (T284589)]] (duration: 01m 05s)
* 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:700854{{!}}flaggedrevs: Reduce levels for ruwiki to 1 (T284589)]] (duration: 01m 05s)
* 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
* 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|72027e136f10867f5db02043b7505390e49130d1}}: Disable indexing in NS_USER and NS_USER_TALK on bnwiki ([[phab:T286152|T286152]]) (duration: 02m 07s)
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4dc11d2333cbf70a4eb20f3fb94a9e363b41d2df}}: Change category name of Babel extension on Javanese Wikipedia ([[phab:T286165|T286165]]) (duration: 02m 10s)
* 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277118|T277118]]
* 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # [[phab:T285811|T285811]]
* 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T277118|T277118]]
* 00:58 eileen: process control updated to {{Gerrit|c291b3c6890364281d}}
* 00:58 eileen: {{Gerrit|c291b3c6890364281d}}
* 00:49 eileen: civicrm revision changed from {{Gerrit|bb62188ec6}} to {{Gerrit|b1c63470bb}}, config revision is {{Gerrit|c291b3c689}}
* 00:48 eileen: process-control config revision is {{Gerrit|c291b3c689}}
* 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)


== 2020-01-28 ==
== 2021-07-13 ==
* 23:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10287 and previous config saved to /var/cache/conftool/dbconfig/20200128-235336-marostegui.json
* 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 08s)
* 23:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10286 and previous config saved to /var/cache/conftool/dbconfig/20200128-234601-marostegui.json
* 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: {{Gerrit|f3627361ff558c89d4a4452ff24b3457f46a4f46}}: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector ([[phab:T286587|T286587]]) (duration: 02m 07s)
* 23:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1084 with its original weight', diff saved to https://phabricator.wikimedia.org/P10285 and previous config saved to /var/cache/conftool/dbconfig/20200128-234219-marostegui.json
* 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 23:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10284 and previous config saved to /var/cache/conftool/dbconfig/20200128-234037-marostegui.json
* 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 15:06 addshore: Start addshore@mwmaint1002:~$ ./T219123.sh # Taking over from @ladsgroup for [[phab:T219123|T219123]]
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
* 09:59 effie: rolling restart mobileapps in codfw
* 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
* 02:05 mutante: gerrit1002 - gzipping a bunch of /var/log/gerrit/ log files ([[phab:T243808|T243808]])
* 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
* 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
* 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
* 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: [[gerrit:704368{{!}}links is flat array (T286040)]] (duration: 02m 07s)
* 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
* 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
* 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
* 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
* 17:45 mutante: mw1283 - decom - powered off by cookbook
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
* 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - [[phab:T280203|T280203]]"
* 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
* 17:09 mutante: mw1282 - decom, powered off
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
* 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
* 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: [[gerrit:704181{{!}}Do not lock user_preferences before updating (T286521)]] (duration: 01m 58s)
* 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade [[phab:T286226|T286226]]
* 16:55 jbond: upload statograph to buster wikimedia
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
* 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom [[phab:T28203|T28203]]
* 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
* 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
* 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
* 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: {{Gerrit|5c07233}} “Components”: Add WikimediaUI theme Figma links to various components (#483)
* 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]] (duration: 03m 28s)
* 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job  - [[phab:T271232|T271232]]
* 13:37 effie: rolling restart php-fpm across clusters - [[phab:T286260|T286260]]
* 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: [[gerrit:704176{{!}}Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260)]] (duration: 00m 58s)
* 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
* 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
* 13:14 kormat: restarted replication on db1117:3325 [[phab:T284622|T284622]]
* 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
* 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
* 12:53 kormat: stopping replication on db1117:3325 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 [[phab:T284622|T284622]]
* 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
* 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - [[phab:T280203|T280203]]
* 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
* 12:20 mutante: mwmaint1002 - scap pull after reimaging
* 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
* 11:28 Lucas_WMDE: EU backport+config window done
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:704304{{!}}Remove obsolete $wgShowDBErrorBacktrace config]] (duration: 01m 25s)
* 11:13 mutante: mwmaint1002 - reimaging with buster ([[phab:T267607|T267607]])
* 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed ([[phab:T267607|T267607]])
* 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
* 10:39 hnowlan: running `nodetool decommission` on maps2008
* 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T277116|T277116]]
* 10:18 moritzm: installing apache security updates on Logstash hosts
* 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
* 09:40 moritzm: installing apache security updates on thanos-fe hosts
* 09:38 moritzm: installing apache security updates on parsoid hosts
* 09:31 effie: depool mw2383 [[phab:T286463|T286463]]
* 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T277116|T277116]]
* 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
* 08:45 effie: depool mw2383 - [[phab:T286463|T286463]]
* 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
* 07:06 moritzm: installing apache security updates on codfw mw* hosts
* 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - [[phab:T273026|T273026]]
* 06:06 effie: pool mw2383  - [[phab:T286463|T286463]]
* 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
* 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
* 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
* 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
* 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`


== 2020-01-27 ==
== 2021-07-12 ==
* 23:40 eileen: civicrm revision changed from {{Gerrit|fbd5c35fb0}} to {{Gerrit|ac730a6bcb}}, config revision is {{Gerrit|837b9d0703}}
* 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1896efc27f3de39659673091bc4c43ad874da0c5}}: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T286163|T286163]]) (duration: 00m 56s)
* 23:10 vgutierrez: rolling restart of varnish-frontend in cp4026 and cp4027
* 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=[[phab:T286396|T286396]] # [[phab:T286396|T286396]]
* 23:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 23:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php ([[phab:T286396|T286396]])
* 23:01 _joe_: restart apache on gerrit
* 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # [[phab:T286396|T286396]]
* 22:58 vgutierrez: restarting gerrit service
* 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|284216a7d35c815ea203a9c0bd738a1e1bf31f7e}}: Add few namespace aliases for Serbian Wikipedia ([[phab:T286396|T286396]]) (duration: 00m 56s)
* 22:01 vgutierrez: restarting varnish-fe on cp4028
* 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8a79bf752ff5eb15f3042fd94ba10c2c50607a85}}: enwiki: Delete Book namespace ([[phab:T285766|T285766]]) (duration: 00m 57s)
* 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10277 and previous config saved to /var/cache/conftool/dbconfig/20200127-191614-marostegui.json
* 23:29 urbanecm@deploy1002: Synchronized static/images/: {{Gerrit|d007b9ccb77db9f3dc492df7a35477e5563a921a}}: Remove unused celebration logos and wordmark ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 19:15 marostegui: Remove partitions from db2085 enwiki - [[phab:T239453|T239453]]
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6c581493fbe5d9c372fd44635b704d04040d8b38}}: Add editautoreviewprotected to bot on hewikisource ([[phab:T275076|T275076]]) (duration: 00m 57s)
* 13:58 vgutierrez: repooling cp4030 - [[phab:T243634|T243634]]
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|40eade4131eac95ba3dc0d918ad540070d7bcb99}}: Enable RelatedArticles Extension in zhwikinews ([[phab:T266933|T266933]]) (duration: 00m 57s)
* 13:54 vgutierrez: restarting varnish-fe on cp4030 - [[phab:T243634|T243634]]
* 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # [[phab:T286101|T286101]], P16817
* 13:54 vgutierrez: repooling cp4029 - [[phab:T243634|T243634]]
* 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ab00d188bc4161e40455b842f613698548b3518}}: zhwiktionary: Add templateeditor right ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 13:36 vgutierrez: restarting varnish-fe on cp4029 - [[phab:T243634|T243634]]
* 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5822b2be129b934939af46bab5b8916039661e97}}: zhwiktionary: Add aliases for namespaces ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 12:10 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id 1860 --to-id 1860 ([[phab:T243705|T243705]])
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ba0967f5c18652d02b7b476e9592b81dcb9b74fc}}: zhwiktionary: Add Reconstruction namespace ([[phab:T286101|T286101]]) (duration: 00m 57s)
* 03:29 gehel: restarting blazegraph on wdqs100[57]
* 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
* 21:26 urbanecm: Start server-side upload for 2 video files ([[phab:T286432|T286432]], [[phab:T286433|T286433]])
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - [[phab:T271232|T271232]] (duration: 03m 39s)
* 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - [[phab:T271232|T271232]]
* 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki ([[phab:T257066|T257066]]) (duration: 00m 58s)
* 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]] (duration: 21m 24s)
* 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]] - extending downtime
* 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis [[phab:T284929|T284929]] [[phab:T284457|T284457]] [[phab:T284392|T284392]]
* 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T277116|T277116]]
* 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T277116|T277116]]
* 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
* 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T277116|T277116]]
* 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
* 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T277116|T277116]]
* 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
* 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change [[phab:T277116|T277116]]
* 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
* 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
* 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
* 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
* 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
* 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
* 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]] (duration: 03m 30s)
* 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo  - [[phab:T271232|T271232]]
* 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]] (duration: 03m 16s)
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - [[phab:T271232|T271232]]
* 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]] (duration: 03m 37s)
* 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - [[phab:T271232|T271232]]
* 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - [[phab:T282484|T282484]]
* 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
* 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:703567{{!}}Enable template search improvements on first wikis 2/2 (T284553)]] (duration: 00m 57s)
* 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703566{{!}}Enable template search improvements on first wikis 1/2 (T284553)]] (duration: 00m 56s)
* 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: [[gerrit:703649{{!}}Always add 1 prefixsearch match when searching for templates]] (duration: 00m 57s)
* 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
* 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
* 11:40 moritzm: installing apache updates on mw1/eqiad hosts
* 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
* 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|773c956811cba5c3a2cbba32bc1e1a536dbd9f0b}}: Revert "Use ptwiki 20th anniversary logos" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
* 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
* 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cd5f5375b4f712c56e9396cc550078272ef668de}}: Revert "ptwiki: Use celebration logos in new vector" ([[phab:T286380|T286380]]) (duration: 00m 57s)
* 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702761{{!}}Add 'editautoreviewprotected' protection level to hewikisource (T275076)]] (duration: 00m 57s)
* 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
* 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:703568{{!}}Enable transclusion back button on first wikis (T284553)]] (duration: 00m 58s)
* 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
* 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
* 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
* 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for [[phab:T285927|T285927]]
* 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
* 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - [[phab:T285251|T285251]]
* 10:03 effie: depool mw2383
* 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
* 10:01 godog: test thanos-compact upload with smaller part size - [[phab:T285835|T285835]]
* 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
* 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
* 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
* 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
* 09:07 godog: repool thanos-fe2002 - [[phab:T285835|T285835]]
* 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - [[phab:T285835|T285835]]
* 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: [[gerrit:703890{{!}}Remove subscribing to other aspect for entity usage (T286193)]] (duration: 00m 59s)
* 07:44 jynus: restart db1102:x1 mariadb instance
* 07:01 moritzm: installing apache2 security updates
* 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish ([[phab:T275268|T275268]])
* 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703951{{!}}Enable json image metadata everywhere (T275268)]] (duration: 01m 05s)
* 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: [[gerrit:703891{{!}}Add --sleep option to refreshImageMetadata.php]] (duration: 01m 04s)
* 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force ([[phab:T275268|T275268]])
* 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: [[gerrit:703950{{!}}Set testcommonswiki to use json image metadata (T275268)]] (duration: 01m 10s)


== 2020-01-26 ==
== 2021-07-09 ==
* 21:45 akosiaris: repool maps1003
* 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:45 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=maps1003.*
* 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 21:42 akosiaris: test depool maps1003
* 22:36 legoktm: running benchmarking scripts again shellbox
* 21:42 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=maps1003.*
* 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 08s)
* 21:38 vgutierrez: powercycling cp3051 - [[phab:T238305|T238305]]
* 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - [[phab:T271232|T271232]]
* 21:23 akosiaris: restart kartotherian on maps1002
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
* 21:19 vgutierrez: restart varnish-fe and ats-tls on cp3056
* 11:40 _joe_: deleting coredns pod in codfw, potentially causing [[phab:T286360|T286360]]
* 21:02 bblack: ats-tls-restart on cp3064
* 10:13 _joe_: recreated all pods for zotero in codfw
* 20:51 bblack: esams text caches: reverting earlier sysctl mitigations
* 00:47 legoktm: zotero rolling restart didn't help, filed [[phab:T286360|T286360]] for DNS issues
* 18:11 volans: shutdown elastic2043 - [[phab:T243715|T243715]]
* 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues
* 18:01 volans: depooled elastic2043 - [[phab:T243715|T243715]]
* 18:01 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=elastic2043.codfw.wmnet
* 17:28 elukey: restart varnishkafka-webrequest on cp3064
* 17:25 elukey: restart varnishkafka-webrequest on cp3056
* 17:03 bblack: reduce /proc/sys/net/ipv4/tcp_max_syn_backlog to 8192 on esams text caches
* 16:55 bblack: reduce /proc/sys/net/ipv4/tcp_synack_retries to 1 on esams text caches
* 16:42 cdanis: ✔️ cdanis@cp4030.ulsfo.wmnet ~ 🕦☕ sudo depool
* 16:38 bblack: applying GRE MTU mitigation from [[phab:T232602|T232602]] to all cp1, cp3, cp5 cache nodes
* 15:43 XioNoX: 3*prepend in esams/knams
* 15:26 elukey: repool deployed
* 15:24 elukey: repool esams
* 15:01 cdanis: deployed
* 15:00 cdanis: depool esams
* 14:56 XioNoX: enabling netflow sampling on the knams-esams links (esams side)
* 11:25 effie: restarted tilerator and tileratorui on maps1002
* 11:23 effie: restarted tilerator and tileratorui on maps1001
* 10:38 effie: deployed
* 10:37 effie: Pool esams back
* 01:12 cdanis: deployed
* 01:12 cdanis: depool esams with new geo-maps-esams-offline


== 2020-01-25 ==
== 2021-07-08 ==
* 12:49 Urbanecm: Run mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mediawikiwiki --logwiki=metawiki TokyVrpns Mike20LCN ([[phab:T243668|T243668]])
* 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - [[phab:T281423|T281423]] (duration: 00m 57s)
* afk: restarting gerrit-replica
* 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - [[phab:T281423|T281423]] (duration: 00m 58s)
* 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
* 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
* 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
* 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
* 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
* 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
* 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]] (duration: 03m 06s)
* 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - [[phab:T271232|T271232]]
* 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]] (duration: 05m 27s)
* 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - [[phab:T271232|T271232]]
* 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]] (duration: 05m 42s)
* 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - [[phab:T271232|T271232]]
* 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - [[phab:T271232|T271232]] [[phab:T273901|T273901]] (duration: 01m 09s)
* 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
* 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
* 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]] (duration: 03m 22s)
* 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - [[phab:T271232|T271232]]
* 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
* 12:52 moritzm: installing klibc security updates on buster
* 12:38 moritzm: installing openexr security updates
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
* 10:20 jbond: upgrade golang-cfssl
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
* 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
* 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
* 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
* 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 [[phab:T284811|T284811]]
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
* 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json


== 2020-01-24 ==
== 2021-07-07 ==
* 22:31 mutante: ganeti1003 - sudo gnt-instance remove etherpad1001.eqiad.wmnet ([[phab:T224580|T224580]])
* 20:22 legoktm: repooling eqiad - https: