You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(cdanis: repool esams πŸ™Œ 😎)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(468 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-02-27 ==
== 2021-08-03 ==
* 00:56 cdanis: repool esams πŸ™Œ 😎
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 00:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:21 jforrester@deploy1001: Synchronized w/extract2.php: [[phab:T239975|T239975]]: Use Article::getPage()->getTouched(), not Article::getTouched (duration: 01m 04s)
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 00:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 04s)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 00:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T232140|T232140]]: Merge definition of wgLogos and wgLogo (duration: 01m 04s)
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 00:13 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232140|T232140]]: Stop setting wgLogoHD from wgLogos (duration: 01m 05s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 00:02 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 00:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T246212|T246212]] Stop setting wgULSLanguageDetection in IS, set in CS (duration: 01m 05s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.jsonΒ  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.jsonΒ  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-masterΒ  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-02-26 ==
== 2021-08-02 ==
* 23:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T246212|T246212]] Set wgULSLanguageDetection false in CS (duration: 01m 04s)
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:55 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 04s)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:54 James_F: jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[phab:T246193|T246193]] Stop setting wgAllowTitlesInSVG, never read (and this was default anyway) (duration: 01m 05s)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 22:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 22:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 22:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:31 tzatziki: removing 1 file for legal compliance
* 22:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:16 tzatziki: removing 7 files for legal compliance
* 22:48 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:47 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:44 foks: removing one file for legal compliance
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 22:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:00 urbanecm: Morning B&C window completed
* 22:16 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 21:52 Urbanecm: Password reset for User:Joax ([[phab:T242941|T242941]])
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 21:28 mutante: ganeti - shutting apt2001 down again
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574454{{!}}Decrease the reads for term store for clients down to Q2Mio (T219123)]], take II (duration: 01m 04s)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574454{{!}}Decrease the reads for term store for clients down to Q2Mio (T219123)]] (duration: 01m 04s)
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 21:15 mutante: ganeti - re-starting apt2001 which is mysteriously broken and "half up" ..as in you can't ssh to it and don't get console but it does cause icinga alerts
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.21/extensions/Wikibase/lib/includes/Store/Sql/Terms: SWAT: [[gerrit:575055{{!}}Do prefetching entity ids on batches of 20 entity per query (T246159)]] (duration: 01m 04s)
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:20 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.21Β  refs [[phab:T233869|T233869]] (duration: 01m 04s)
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 20:19 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.21Β  refs [[phab:T233869|T233869]]
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:10 XioNoX: add BGP to AS4780 in Equinix Palo-Alot
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 20:09 XioNoX: add BGP to AS8859 in AMS-IX
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 20:00 Amir1: Morning SWAT is done
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 19:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]], take II (duration: 01m 04s)
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 19:56 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]] (duration: 01m 02s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:09 bstorm_: downtimed labstore1004/5, cloudstore1008/9 and cloudbackup1001/2 for merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:05 mutante: phab1001 - manually running community_metrics and project_changes scripts (crons) ([[phab:T244677|T244677]])
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 Amir1: setting cache type of mwdebug1001 to LCStoreStaticArray, this would break group1 and group2 in that node ([[phab:T99740|T99740]])
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 17:42 XioNoX: remove ns2 redirect to eqiad on cr3-knams
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 17:40 XioNoX: re-enable transits on cr3-esams
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 robh: cr2-esasms work done, cr3-esams linecard swap starting now via [[phab:T245825|T245825]]
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:40 robh: please note cr2-esams work is ongoing via [[phab:T246009|T246009]] and its downtime is expected
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 16:00 jynus: deploy new grants to phabricator stats user to database on m3 [[phab:T246105|T246105]]
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 15:51 jynus: starting s2, s3 eqiad backup source data check; expect increase read traffic on db1095:3313, db1140:3312, db1078, db1090:3312 [[phab:T244958|T244958]]
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 15:25 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=20to30holes-25feb2229 # [[phab:T219123|T219123]]
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 15:19 volans@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:17 volans@cumin1001: START - Cookbook sre.hosts.decommission
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:54 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:54 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:51 volans@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 14:46 volans@cumin2001: START - Cookbook sre.ganeti.makevm
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:19 volans@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:19 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 14:12 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 14:11 volans@cumin2001: START - Cookbook sre.hosts.decommission
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 14:05 gehel: restart of elasticsearch on cloudelastic for JVM upgrade completed
* 12:20 mutante: gerrit servers: disabling puppet
* 14:03 XioNoX: deactivate BGP to AS23930 on cr1-eqsin, will re-enable when their technical issues are fixed and they notify us
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 14:00 elukey: run apt-get clean on notebook1004 to free some space - [[phab:T224682|T224682]]
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 13:46 XioNoX: ganeti2001:~$ sudo gnt-instance shutdown apt2001.wikimedia.org - [[phab:T224576|T224576]]
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 12:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 12:26 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 12:24 kartik@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit{{!}}416973{{!}}ContentTranslation: Set cookieDomain for Production]] (duration: 01m 04s)
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 12:11 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574469{{!}}Enable CX out of beta in eu, sw, and ta Wikipedias (T245446, T245447, T245448)]] take II (duration: 01m 05s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 12:10 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574469{{!}}Enable CX out of beta in eu, sw, and ta Wikipedias (T245446, T245447, T245448)]] (duration: 01m 15s)
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 12:05 volans: uploaded spicerack_0.0.31-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 11:27 hashar: restarting Jenkins on contint2001
* 11:45 jbond42: changing uid/gid of reprepro effects release[12]001/install[12]002
* 11:27 hashar: restarting Jenkins on contint1001
* 11:05 moritzm: rolling out remaining PHP 7.0 security updates
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:52 moritzm: installing clamav security updates on mendelevium (ticket.wikimedia.org
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 10:03 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias parsoid/deployment-servers/mw-maintenance
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:54 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-eqiad
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 09:37 elukey@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:34 elukey: roll restart the Hadoop Analytcs workers for openjdk upgrades
* 11:13 urbanecm: EU B&C window completed
* 09:32 elukey: upgrade prometheus-mcrouter-exporter 0.1.0+git20200225-1 to all cumin alias all-mw-codfw
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 09:18 gehel: restarting elasticsearch on cloudelastic for JVM upgrade
* 11:08 moritzm: installing openjdk-11 security updates
* 08:51 elukey: upload prometheus-mcrouter-exporter 0.1.0+git20200225-1 to stretch-wikimedia
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 08:38 elukey: upgrade prometheus-mcrouter-exporter on mwdebug1001 to test the new version
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 06:19 marostegui: Stop MySQL and poweroff db1084 for BBU replacement - [[phab:T245647|T245647]]
* 07:24 moritzm: installing libsndfile security updates on buster
* 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019 after on-site maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10530 and previous config saved to /var/cache/conftool/dbconfig/20200226-061710-marostegui.json
* 07:12 moritzm: installing aspell security updates
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1017 (master) original weight (0) [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10529 and previous config saved to /var/cache/conftool/dbconfig/20200226-061640-marostegui.json
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for BBU replacement - [[phab:T245647|T245647]]', diff saved to https://phabricator.wikimedia.org/P10528 and previous config saved to /var/cache/conftool/dbconfig/20200226-060906-marostegui.json
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:41 kart_: Updated cxserver to 2020-02-24-110149-production ([[phab:T227183|T227183]])
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 05:35 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:31 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 05:29 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 01:15 ejegg: updated payments-wiki from {{Gerrit|c3ca3ad6a7}} to {{Gerrit|bfae734204}}
* 00:48 eileen: civicrm revision changed from {{Gerrit|bec2d6ad9f}} to {{Gerrit|62e62e107c}}, config revision is {{Gerrit|c0ef31e2fd}}
* 00:21 James_F: Manually purged https://de.wikipedia.org/w/index.php?title=Hans-Werner_Sahm&action=history from mwmaint1002
* 00:15 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 00:15 James_F: SWAT complete.
* 00:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242381|T242381]] Set Vector skin version defaults so they can be changed on Beta Cluster (duration: 01m 04s)
* 00:09 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bonus sync for cache clearance (duration: 01m 03s)
* 00:08 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245792|T245792]] Enable password-reset-update on Wikivoyages and Wiktionaries (duration: 01m 04s)
* 00:08 ebernhardson: resume writes from mediawiki to cloudelastic


== 2020-02-25 ==
== 2021-07-31 ==
* 23:51 XioNoX: cr2-esams> request chassis fpc slot 0 offline - [[phab:T246009|T246009]]
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:38 ebernhardson: pause mediawiki writes to cloudelastic to let old gc on cloudelastic1001-chi recover
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:30 mutante: notebook1004 - disk full once again ([[phab:T232068|T232068]])
* 23:28 mutante: adding mw2366 through mw2376 to site
* 22:17 jhuneidi@deploy1001: Synchronized php-1.35.0-wmf.21/includes/Defines.php: Update MW_VERSION to 1.35.0-wmf.21 (duration: 01m 04s)
* 22:17 mutante: scandium restarting php7.2-fpm
* 22:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 22:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:29 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.21Β  refs [[phab:T233869|T233869]]
* 21:19 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.21Β  refs [[phab:T233869|T233869]] (duration: 75m 21s)
* 20:42 eileen: process-control config revision is {{Gerrit|c0ef31e2fd}}
* 20:32 eileen: process-control config revision is {{Gerrit|e17d104c73}} slow down delete deleted contacts
* 20:28 tzatziki: reset password for ClioCJS
* 20:25 tzatziki: changing email address for ClioCJS
* 20:25 mutante: apt.wikimedia.org (current install* and new apt* roles) - going ECDSA-only and removing RSA certificate from nginx config - to support buster without having to maintain patched nginx for duplicate ssl_stapling_file directive - at the cost of slightly reduced back-compat on the public repo ([[phab:T242602|T242602]])
* 20:24 mutante: apt.wikimedia.org (current install* and new apt* roles) - going ECDSA-only and removing RSA certificate from nginx config - to support buster without having to maintain patched nginx for duplicate ssl_stapling_file directive - at the cost of slightly reduced back-compat on the public repo ([[phab:T224576|T224576]])
* 20:18 eileen: process-control config revision is {{Gerrit|e17d104c73}}
* 20:04 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.21Β  refs [[phab:T233869|T233869]]
* 20:01 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.19 (duration: 14m 35s)
* 19:58 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:55 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:52 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:47 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:47 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:45 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:44 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:30 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 19:23 longma: 1.35.0-wmf.21 was branched at {{Gerrit|ed65726f0dcaf2b163ba44426d5e780bc7f8895d}} for [[phab:T233869|T233869]]
* 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 19:20 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Decrease the reads for term store for clients back to Q2Mio (T219123)]], take II (duration: 00m 56s)
* 18:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Decrease the reads for term store for clients back to Q2Mio (T219123)]] (duration: 00m 56s)
* 18:00 jynus@cumin1001: dbctl commit (dc=all): 'increase s8 special replica weight', diff saved to https://phabricator.wikimedia.org/P10520 and previous config saved to /var/cache/conftool/dbconfig/20200225-180016-jynus.json
* 17:21 jynus@cumin1001: dbctl commit (dc=all): 'increase es1019 load to 50% [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10519 and previous config saved to /var/cache/conftool/dbconfig/20200225-172133-jynus.json
* 17:15 vgutierrez: restart ats-tls on cp1075 - [[phab:T244538|T244538]]
* 17:10 ejegg: restarted new Ingenico recurring donation charge job
* 17:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]], take II (duration: 00m 55s)
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:00 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q6Mio (T219123)]] (duration: 00m 56s)
* 16:45 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics-external' for release 'production' .
* 16:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q4Mio (T219123)]], take II (duration: 00m 56s)
* 16:36 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q4Mio (T219123)]] (duration: 00m 56s)
* 16:25 vgutierrez: enable BGP in lvs2009 - [[phab:T196560|T196560]] [[phab:T245984|T245984]]
* 16:17 godog: restart debmonitor / puppetboard - [[phab:T245512|T245512]]
* 16:17 moritzm: installing pillow security updates
* 16:09 vgutierrez: update puppet compiler facts
* 16:08 XioNoX: add BGP to lvs2009 on cr1/2-codfw
* 16:02 jynus@cumin1001: dbctl commit (dc=all): 'repool es1019 with low load after maintenance [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10516 and previous config saved to /var/cache/conftool/dbconfig/20200225-160215-jynus.json
* 16:00 ejegg: restarted legacy Ingenico recurring donation charge job
* 15:59 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q2Mio (T219123)]], take II (duration: 00m 55s)
* 15:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:58 ejegg: updated Fundraising CiviCRM from {{Gerrit|88c72e39ca}} to {{Gerrit|bec2d6ad9f}}
* 15:58 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q2Mio (T219123)]] (duration: 00m 56s)
* 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:36 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q1Mio (T219123)]], take II (duration: 00m 55s)
* 15:34 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q1Mio (T219123)]] (duration: 00m 56s)
* 15:16 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q512K (T219123)]], take II (duration: 00m 55s)
* 15:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q512K (T219123)]] (duration: 00m 56s)
* 15:06 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 55s)
* 15:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 56s)
* 14:46 godog: roll restart netbox uwsgi - [[phab:T245511|T245511]]
* 14:40 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:39 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:39 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:37 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 14:35 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: [[gerrit:574746{{!}}wbterms: only select entity terms that are requested (T246005)]] (duration: 01m 02s)
* 14:30 vgutierrez: restart pybal with BGP enabled on lvs2010 - [[phab:T245984|T245984]] [[phab:T196560|T196560]]
* 14:20 vgutierrez: update puppet compiler facts
* 14:16 bblack: dns1002 - start reimage - [[phab:T241770|T241770]]
* 14:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:574743{{!}}Reinstate wgULSLanguageDetection setting (T246071)]] (duration: 01m 03s)
* 14:14 XioNoX: add bgp session to 10.192.49.7 (lvs2010) on cr1/cr2-codfw
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:01 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:42 godog: roll-restart logstash in eqiad/codfw - [[phab:T227080|T227080]]
* 13:28 Urbanecm: mwscript updateSpecialPages.php --wiki=enwiki --override --only=Mostcategories
* 13:00 Urbanecm: Run mwscript updateSpecialPages.php --wiki=enwiki --override --only=Uncategorizedcategories, cron didn't do that for several months ([[phab:T246063|T246063]])
* 12:51 marostegui: Stop mysql on es1019 - [[phab:T243963|T243963]]
* 12:49 bblack: dns1002 - shutdown for hardware work after confirming drain of live requests - [[phab:T241770|T241770]]
* 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for on-site maintenance - [[phab:T243963|T243963]]', diff saved to https://phabricator.wikimedia.org/P10512 and previous config saved to /var/cache/conftool/dbconfig/20200225-124650-marostegui.json
* 12:44 bblack: dns1002 - downtimed, disabled puppet, and depool (stop BGP adverts) for hardware work - [[phab:T241770|T241770]]
* 12:33 Urbanecm: Run mwscript updateSpecialPages.php --wiki=enwiki --override --only=Wantedtemplates, cron didn't do that for several months ([[phab:T246063|T246063]])
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Increase traffic on db1107 for 10.4 on special groups 10 -> 50 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10511 and previous config saved to /var/cache/conftool/dbconfig/20200225-123222-marostegui.json
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: {{Gerrit|1f58d9a}}: New throttle rule for arwiki WikiGap ([[phab:T246092|T246092]]) (duration: 00m 56s)
* 12:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|cdde3a2}}: {{Gerrit|db90d22}} ([[phab:T245525|T245525]], [[phab:T243359|T243359]]) (duration: 00m 58s)
* 10:11 volans: re-enabling puppet on A:swift-be-eqiad
* 09:31 volans: re-enabling puppet on A:swift-be-codfw
* 09:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:10 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=10to20holes-24feb1345 # [[phab:T219123|T219123]]
* 09:09 addshore: addshore@mwmaint1002:~$ time mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=50 --sleep=1 --file=10to20holes-24feb1345
* 08:23 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 08:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing in main API and special groups - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10510 and previous config saved to /var/cache/conftool/dbconfig/20200225-075304-marostegui.json
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to analyze recentchanges table - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10508 and previous config saved to /var/cache/conftool/dbconfig/20200225-065741-marostegui.json
* 06:02 marostegui: Move labsdb1010 under db2094:3318 - [[phab:T232446|T232446]]
* 02:59 ejegg: updated Fundraising CiviCRM from {{Gerrit|b9d1acdb6d}} to {{Gerrit|88c72e39ca}}
* 01:12 jforrester@deploy1001: Synchronized wmf-config/interwiki.php: [[phab:T238803|T238803]]: Update interwiki cache (duration: 00m 56s)
* 00:59 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T238803|T238803]]: Drop ability to load SkinPerPage, EUCopyrightCampaign, and EUCopyrightCampaignSkin (duration: 00m 56s)
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T238803|T238803]]: Remove all IS config related to the fixcopyrightwiki wiki (duration: 00m 55s)
* 00:51 James_F: Ran `DELETE FROM globalimagelinks WHERE gil_wiki='fixcopyrightwiki';` - one row removed [[phab:T238803|T238803]]
* 00:51 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop trying to read wmgUseSkinPerPage or wmgUseEUCopyrightCampaign (duration: 00m 55s)
* 00:48 James_F: Confirmed not SUL entries for fixcopyrightwiki as expected [[phab:T238803|T238803]]
* 00:47 jforrester@deploy1001: Synchronized static/images/project-logos/: [[phab:T238803|T238803]]: Remove fixcopyrightwiki project logos (duration: 00m 56s)
* 00:46 ejegg: updated Fundraising CiviCRM from {{Gerrit|87b13fd3b5}} to {{Gerrit|b9d1acdb6d}}
* 00:46 jforrester@deploy1001: Synchronized dblists/: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from dblists in general (duration: 00m 58s)
* 00:45 jforrester@deploy1001: rebuilt and synchronized wikiversions files: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from wikiversions
* 00:43 jforrester@deploy1001: Synchronized dblists/all.dblist: [[phab:T238803|T238803]]: Remove fixcopyrightwiki from all.dblist (duration: 00m 56s)
* 00:39 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 00:38 ejegg: disabled recurring donation charge jobs for CiviCRM update
* 00:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgMaxGeneratedPPNodeCount or wgParserConf::preprocessorClass, never read (duration: 00m 56s)
* 00:23 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T245983|T245983]] Read wmgApprovedContentSecurityPolicyDomains for CSP (duration: 00m 56s)
* 00:21 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245983|T245983]] Set wmgApprovedContentSecurityPolicyDomains (duration: 00m 57s)


== 2020-02-24 ==
== 2021-07-30 ==
* 22:58 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:38 XioNoX: redirect ns2 to authdns1001
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:34 mutante: stat1007Β  sudo systemctl reset-failed to clear Icinga alerts about reportupdater-pingback.service
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 22:22 XioNoX: disable transits on cr3-esams
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:43 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name [[phab:T244387|T244387]] (duration: 01m 14s)
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:42 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@f87bdd9]: Take service name into account for consumer group name [[phab:T244387|T244387]]
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:26 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'canary' .
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:23 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:05 eileen: civicrm revision changed from {{Gerrit|fffc215e75}} to {{Gerrit|87b13fd3b5}}, config revision is {{Gerrit|561ae21f77}}
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 20:58 XioNoX: test flowspec BGP config on cr3-knams
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:32 XioNoX: load new FW policies on pfw3-eqiad/codfw - [[phab:T246036|T246036]]
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* food: updated Fundraising CiviCRM from {{Gerrit|426e3547ca}} to {{Gerrit|fffc215e75}}
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 20:03 eileen: civicrm revision changed from {{Gerrit|c086fd4e0b}} to {{Gerrit|426e3547ca}}, config revision is {{Gerrit|561ae21f77}}
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:02 mutante: installing OS on new ganeti VMs apt1001 and apt2001.wikimedia.org for buster APT repos
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Changes here areonly used in tests right now, but keep line numbers sync'ed (duration: 00m 56s)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 mutante: deploying cluster apache config change - adds gr.wikimedia.org vhost and refreshes apache2
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 17:10 jforrester@deploy1001: Synchronized wmf-config/flaggedrevs.php: Sync doc-only change; should be a no-op (duration: 00m 57s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:16 jynus: reloading ferm on ms-be2028 DNS query timed out
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 16:11 jynus: reloading ferm on ms-be2043 DNS query timed out
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 16:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 56s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:57 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 56s)
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:30 moritzm: updated component/jdk8 to 8u242-b08-1~deb10u1 (forward port of latest Java 8 security update)
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce weight for db1126, increase it a bit for db1101:3318', diff saved to https://phabricator.wikimedia.org/P10498 and previous config saved to /var/cache/conftool/dbconfig/20200224-152132-marostegui.json
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:05 marostegui: Deploy schema change on db1086 (s7 master) with replication - [[phab:T245925|T245925]]
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:59 marostegui: read_only=0 on es1020 (es4) and es1023 (es5) - unused new external store masters - [[phab:T245806|T245806]]
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:56 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]], take II (duration: 00m 55s)
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:55 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q256K (T219123)]] (duration: 00m 57s)
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:47 andrew@deploy1001: Finished deploy [horizon/deploy@dab0ca0]: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this) (duration: 03m 33s)
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:44 andrew@deploy1001: Started deploy [horizon/deploy@dab0ca0]: modest css change for the hiera editing dialog (take two -- I consistently forget to rebase before doing this)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:43 andrew@deploy1001: Finished deploy [horizon/deploy@a8f2ea9]: modest css change for the hiera editing dialog (duration: 00m 12s)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:43 andrew@deploy1001: Started deploy [horizon/deploy@a8f2ea9]: modest css change for the hiera editing dialog
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:42 marostegui: Compress innodb on wb_terms on db1087 - [[phab:T232446|T232446]]
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:03 _joe_: depooling esams (authdns-update)
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 13:51 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q120K (T219123)]], take II (duration: 00m 55s)
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 13:48 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q120K (T219123)]] (duration: 00m 56s)
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 13:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q60K (T219123)]], take II (duration: 00m 56s)
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 13:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q60K (T219123)]] (duration: 00m 56s)
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 13:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q30K (T219123)]], take II (duration: 00m 56s)
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:574454{{!}}Increase the reads for term store for clients for up to Q30K (T219123)]] (duration: 00m 56s)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 13:05 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 18s)
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:01 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573965{{!}}Disallow crats to (un)assign flow-bot group on enwiki (T245716)]] (duration: 00m 56s)
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 12:59 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573965{{!}}Disallow crats to (un)assign flow-bot group on enwiki (T245716)]] (duration: 00m 56s)
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 12:48 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 56s)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 12:47 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 56s)
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 12:38 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:571738{{!}}Add definitions for redirect badges (T235420)]], take II, the cache issue (duration: 00m 56s)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 12:37 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:571738{{!}}Add definitions for redirect badges (T235420)]] (duration: 00m 56s)
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 12:23 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/client/includes: SWAT: [[gerrit:574391{{!}}Use formatter cache in client LUA label lookups (T245740)]] (duration: 00m 56s)
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 12:19 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/dumpInterwiki.php: dumpInterwiki: Respect comments in dblists ([[phab:T244906|T244906]]) (duration: 00m 56s)
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}574265{{!}}CX: Adjust MT threshold for Telugu WP to 70% (T244769)]] (duration: 00m 56s)
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 12:05 XioNoX: re-enable deactivated BGP sessions from ulsfo to office - [[phab:T239893|T239893]]
* 11:23 moritzm: installing libsndfile security updates on stretch
* 12:02 vgutierrez: reimage pybal-test2001 as buster - [[phab:T224570|T224570]] [[phab:T245984|T245984]]
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 11:49 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 55s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 11:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:574398{{!}} Bumping portals to master (563985)]] (duration: 00m 57s)
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 11:27 vgutierrez: upload pybal 1.15.8 to apt.wm.o (buster) - [[phab:T245984|T245984]]
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 11:06 volans: restarted ferm on ms-be2046
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 11:02 marostegui: Move labsdb1009, labsdb1011 and labsdb1012 (labsdb1010 is currently delayed, will be done later) to replicate under codfw for a few days while we alter wb_terms on db1087 - [[phab:T232446|T232446]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 10:59 effie: upgrading scap in eqiad and codfw - [[phab:T245530|T245530]]
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 10:55 volans: restarted ferm on ms-be2016, had failed with DNS query for 'ms-be2056.codfw.wmnet' failed: query timed out
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 10:41 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase: [[gerrit:574386{{!}}Add metric for recording cache hits in StatsdRecordingSimpleCache]] ([[phab:T244260|T244260]]) (duration: 01m 04s)
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 10:34 godog: onboard netbox to logging pipeline
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 10:12 marostegui: Stop db1087 and db2079 in sync - [[phab:T232446|T232446]]
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression and place db1101:3318 into vslow,dump - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10493 and previous config saved to /var/cache/conftool/dbconfig/20200224-101030-marostegui.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 09:21 godog: bounce ferm on ms-be2023, it had failed (no entries in journald)
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 09:08 elukey: update puppet compiler's facts
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es5 eqiad - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10492 and previous config saved to /var/cache/conftool/dbconfig/20200224-084027-marostegui.json
* 08:34 marostegui@deploy1001: Synchronized wmf-config/etcd.php: Add es4 and es5 (unused new external store sections to etcd - [[phab:T245806|T245806]] (duration: 00m 58s)
* 08:29 marostegui: Temporary put es1020 (es4) and es1023 (es5) on RO on a mysql level - [[phab:T245806|T245806]]
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es5 codfw - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10491 and previous config saved to /var/cache/conftool/dbconfig/20200224-082848-marostegui.json
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es4 eqiad - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10490 and previous config saved to /var/cache/conftool/dbconfig/20200224-080708-marostegui.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Add instances to es4 codfw - [[phab:T245806|T245806]]', diff saved to https://phabricator.wikimedia.org/P10489 and previous config saved to /var/cache/conftool/dbconfig/20200224-080128-marostegui.json
* 07:31 cdanis: dbctl: edit es4/es5 sections in eqiad (flavor & master & min_replicas fields) [[phab:T245806|T245806]]
* 07:30 cdanis: dbctl: (and min_replicas field) [[phab:T245806|T245806]]
* 07:29 cdanis: dbctl: edit es4/es5 sections in codfw (flavor & master fields) [[phab:T245806|T245806]]
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1107 for 10.4 testing in special slaves group with weight 10 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10488 and previous config saved to /var/cache/conftool/dbconfig/20200224-071201-marostegui.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing in main and API - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10487 and previous config saved to /var/cache/conftool/dbconfig/20200224-070337-marostegui.json
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10486 and previous config saved to /var/cache/conftool/dbconfig/20200224-064044-marostegui.json
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10485 and previous config saved to /var/cache/conftool/dbconfig/20200224-063258-marostegui.json
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10484 and previous config saved to /var/cache/conftool/dbconfig/20200224-062226-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318 after removing partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10483 and previous config saved to /var/cache/conftool/dbconfig/20200224-060118-marostegui.json
* 05:57 marostegui: Repool labsdb1011 - [[phab:T245797|T245797]]


== 2020-02-23 ==
== 2021-07-29 ==
* 16:52 elukey: powercycle mw1372 - no mgmt console, no ssh
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 15:17 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user='π°‡π±…π°šπ°€' /home/urbanecm/T245950 ([[phab:T245950|T245950]])
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15Β  refs [[phab:T281157|T281157]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2020-02-22 ==
== 2021-07-28 ==
* 03:41 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 03:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 02:17 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 02:16 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 02:13 mutante: ganeti - removing instances apt1001/apt2001 again, starting over
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 01:53 mutante: starting new ganeti VMs apt1001 and apt2001 for OS install (WIP, not prod)
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 00:45 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 00:43 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 00:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 00:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 00:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 00:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-02-21 ==
== 2021-07-27 ==
* 23:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:24 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:05 andrewbogott: updated (?) wikitech-static to 1.34.0
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 22:01 sbassett@deploy1001: Finished scap: Deploy security fix for [[phab:T232932|T232932]] (duration: 05m 35s)
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 21:56 sbassett@deploy1001: Started scap: Deploy security fix for [[phab:T232932|T232932]]
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 21:53 andrew@deploy1001: Finished deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two (duration: 03m 41s)
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 21:49 andrew@deploy1001: Started deploy [horizon/deploy@a8f2ea9]: added a warning about the public git history to the hiera edit panel -- take two
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 21:45 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel (duration: 00m 11s)
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 21:45 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: added a warning about the public git history to the hiera edit panel
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 21:23 mutante: LDAP - added ldickinson to wmf
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 21:23 mutante: LDAP - added dduvall to archiva-deployers
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 21:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 21:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 21:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 20:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 20:52 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 20:50 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 20:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 20:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 20:29 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 20:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 18:34 XioNoX: re-enable GRE tunnels on cr3-esams - [[phab:T245825|T245825]]
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 15:55 XioNoX: add gobgpd to buster-wikimedia repo
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 15:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 15:06 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 13:38 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245778|T245778]] [[phab:T245182|T245182]] [[phab:T232140|T232140]] (duration: 01m 00s)
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:29 mark: cr3-esams: Shutdown GRE tunnels over Telia
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 12:27 akosiaris: repool mathoid at eqiad, test complete
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mathoid
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 12:20 moritzm: rebooting boron
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 12:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 12:20 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 12:17 moritzm: bumped memory for boron.eqiad.wmnet to 16G
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 12:04 mark: cr3-esams: request chassis fpc offline slot 1
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 11:57 mark: Disabled Telia transit on cr3-esams
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 11:57 mark: Set VRRP prio cost to 50 on cr3-esams to make it backup VRRP
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 11:48 elukey: restart varnishkafka-webrequest on cp3052 (stuck in timeouts to kafka, analytics alarms raised)
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 11:47 elukey: restart varnishkafka-webrequest on cp3056/cp3058/cp3054/cp3064 (stuck in timeouts to kafka, analytics alarms raised)
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 11:39 elukey: restart varnishkafka on cp3057 (stuck in timeouts to kafka, analytics alarms raised)
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 11:21 godog: bounce logstash on logstash1023 - see if can catch up with elastic7 kafka lag
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 11:14 elukey: reboot stat1005 - GPU blocked at 100% after issue with tensorflow
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 09:18 akosiaris: depool mathoid in eqiad for a test
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 09:18 akosiaris@puppetmaster1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mathoid
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10473 and previous config saved to /var/cache/conftool/dbconfig/20200221-085405-marostegui.json
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 08:34 fdans@deploy1001: Finished deploy [analytics/refinery@4d56021]: deploying refinery (duration: 14m 55s)
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 08:19 fdans@deploy1001: Started deploy [analytics/refinery@4d56021]: deploying refinery
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 08:02 akosiaris: disable mod_remoteip on otrs host, following merge of https://gerrit.wikimedia.org/r/573877
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 - [[phab:T245797|T245797]]
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 06:58 marostegui: Stop MySQL on labsdb1012 to clone labsdb1011 -
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 06:34 marostegui: Stop mysql on es1024 to clone es1025 - [[phab:T243052|T243052]]
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 05:57 marostegui: Start MySQL on labsdb1011 without replication - [[phab:T245797|T245797]]
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 05:44 marostegui: Reload haproxy on dbproxy1010, dbproxy1011, dbproxy18 - [[phab:T245797|T245797]]
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 02:53 bstorm_: depooled labsdb1011 and set weight 10 on labsdb1009 vs 3 on labsdb1010 [[phab:T245797|T245797]]
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 02:43 ejegg: updated Fundraising CiviCRM from {{Gerrit|a6b222c19f}} to {{Gerrit|c086fd4e0b}}
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 02:27 bstorm_: stopped mariadb on labsdb1011 because it keeps crashing anyway
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 01:05 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync Beta-Cluster-only change to CommonSettings now we're sure we won't revert (duration: 00m 56s)
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 01:04 andrew@deploy1001: Finished deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages. (duration: 03m 32s)
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 01:01 andrew@deploy1001: Started deploy [horizon/deploy@13ca90a]: Remove guided puppet config mode; this gets us back to working with latest puppet packages.
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.Β  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-02-20 ==
== 2021-07-26 ==
* 23:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245787|T245787]] [nlwiki] Add noindex for NS_USER and NS_USER_TALK (duration: 00m 56s)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:46 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Stop setting wgVectorPrintLogo for back-compat., not read since wmf.19 (duration: 00m 56s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw232[0-4].codfw.wmnet
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:45 mutante: gerrit1002 - test VM - rebooting for new disk
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231[7-9].codfw.wmnet
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:33 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw232[0-4].codfw.wmnet
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231[7-9].codfw.wmnet
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 23:32 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw2381[7-9].codfw.wmnet
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 23:25 mutante: ganeti1003 - adding another virtual 20G disk to gerrit1002 ([[phab:T243808|T243808]])
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 23:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 23:04 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/pager/IndexPager.php: IndexPager: Limit offset params to the max of the indices available (duration: 00m 56s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 23:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 22:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:28 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 22:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 22:28 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1 (duration: 05m 05s)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 22:23 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@8908dd1]: daemons: Install stack printing signal handler on SIGUSR1
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 21:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245780|T245780]] [mediawikiwiki] Deny the 'flow-hide' right to logged out and non-autoconfirmed users (duration: 00m 56s)
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 20:07 James_F: Train 1.35.0-wmf.20 provisionally looks OK on all wikis. Closing [[phab:T233868|T233868]].
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.20
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 19:55 twentyafterfour: hotfix deployed
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:51 twentyafterfour: deploying phabricator hotfix:Β  https://phabricator.wikimedia.org/rPHEX2f36eee7ce67eb0c09e9bb0e79b42fc3b41d3597 for [[phab:T244165|T244165]]
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:33 bblack: codfw+ulsfo repooled in geodns
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 18:20 fdans@deploy1001: Finished deploy [analytics/refinery@e05ae16]: deploying refinery (duration: 11m 31s)
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 18:08 fdans@deploy1001: Started deploy [analytics/refinery@e05ae16]: deploying refinery
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 17:38 bblack: pushed codfw+ulsfo geodns depool
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 16:45 jynus: stop, upgrade and restart dbprov2002
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 16:26 jynus: stop, upgrade and restart dbprov1002
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 16:23 moritzm: installing Java security updates on Hadoop/Kafka Jumbo/AQS/Druid
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 16:16 jynus: stop, upgrade and restart db1140
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 16:12 moritzm: installing postgres security updates on netboxdb*
* 06:39 moritzm: installing krb5 security updates
* 16:03 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm (duration: 06m 15s)
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 15:57 fdans@deploy1001: Started deploy [analytics/aqs/deploy@125cffa]: deploying aqs, third time is the charm
* 15:40 marostegui: Poweroff es2022 [[phab:T245714|T245714]]
* 15:32 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@95a7999]: deploying aqs (duration: 00m 48s)
* 15:32 fdans@deploy1001: Started deploy [analytics/aqs/deploy@95a7999]: deploying aqs
* 15:23 fdans@deploy1001: Finished deploy [analytics/aqs/deploy@cbc3241]: deploying aqs (duration: 04m 06s)
* 15:19 fdans@deploy1001: Started deploy [analytics/aqs/deploy@cbc3241]: deploying aqs
* 14:38 Urbanecm: [dry-run; mwmaint1002] foreachwiki extensions/AbuseFilter/maintenance/fixOldLogEntries.php --dry-run --verbose ([[phab:T228655|T228655]])
* 12:53 moritzm: installing PHP updates on matomo1001/piwik
* 12:28 moritzm: installing PHP 7.0 security updates
* 12:11 Urbanecm: EU SWAT done
* 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|728d739}}: Configure logo for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 12:05 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: {{Gerrit|64240e1}}: Add logos for ngwikimedia ([[phab:T242416|T242416]]) (duration: 01m 04s)
* 11:19 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 11:08 moritzm: installing boost update from Buster point release
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10468 and previous config saved to /var/cache/conftool/dbconfig/20200220-105117-marostegui.json
* 10:12 Reedy: created $wikidb.blobs_cluster27 on es1023 - [[phab:T245720|T245720]]
* 10:08 Reedy: created $wikidb.blobs_cluster26 on es1020 - [[phab:T245720|T245720]]
* 10:08 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 04s)
* 09:42 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 03s)
* 09:27 reedy@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/WikimediaMaintenance/storage/make-all-blobs: (no justification provided) (duration: 01m 01s)
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10467 and previous config saved to /var/cache/conftool/dbconfig/20200220-091233-marostegui.json
* 09:02 akosiaris: restart etherpad-lite on etherpad1002 [[phab:T244238|T244238]]
* 09:00 marostegui: Restart m1 database master db1135 (etherpad will not be available for around 1 minute) - [[phab:T244238|T244238]]
* 08:40 jynus: disable puppet and stop bacula service [[phab:T244238|T244238]]
* 08:35 marostegui: Upgrade mysql on db1135 without restart [[phab:T244238|T244238]]
* 07:47 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) - in case of cache issues (duration: 01m 03s)
* 07:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q15k (was Q10k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 01s)
* 07:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q10k (was Q8k) ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:17 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) - in case of cache issue (duration: 01m 03s)
* 07:15 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q8000 ([[phab:T225057|T225057]]) (duration: 01m 03s)
* 07:01 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) - extra sync for cache issue (duration: 01m 04s)
* 07:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q6000 ([[phab:T225057|T225057]]) (duration: 01m 06s)
* 06:46 vgutierrez: test trafficserver 8.0.6-rc1 in cp30[64,65]
* 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after crash - [[phab:T245621|T245621]]', diff saved to https://phabricator.wikimedia.org/P10466 and previous config saved to /var/cache/conftool/dbconfig/20200220-062445-marostegui.json
* 06:17 marostegui: Repool labsdb1011
* 06:12 marostegui: Remove partitions from db1101:3318 - [[phab:T239453|T239453]]
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10465 and previous config saved to /var/cache/conftool/dbconfig/20200220-061213-marostegui.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318 this host already had the partitions removed - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10464 and previous config saved to /var/cache/conftool/dbconfig/20200220-061019-marostegui.json
* 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 to remove revision partitions - [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10463 and previous config saved to /var/cache/conftool/dbconfig/20200220-060914-marostegui.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 on s8, db1099:3318 back to its original weight', diff saved to https://phabricator.wikimedia.org/P10462 and previous config saved to /var/cache/conftool/dbconfig/20200220-055943-marostegui.json
* 00:22 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571860{{!}}Allow non-autoconfirmed users to propose OAuth apps (T213760)]] (duration: 01m 04s)
* 00:16 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573397{{!}}Enable password-reset (requireemail pref) on test WD and Commons (T245660)]] (duration: 01m 03s)


== 2020-02-19 ==
== 2021-07-24 ==
* 23:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw138[0-3].eqiad.wmnet
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw137[4-9].eqiad.wmnet
* 23:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 23:28 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: cirrus: Reduce CirrusSearch-MoreLike cache workers and queue back to normal (duration: 01m 03s)
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw138[0-3].eqiad.wmnet
* 23:26 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw137[4-9].eqiad.wmnet
* 23:25 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1363.eqiad.wmnet
* 23:23 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: redirect more_like from codfw back to eqiad (duration: 01m 04s)
* 23:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:10 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:57 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update (duration: 00m 57s)
* 22:56 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c16c63a]: articletopic thresholding for ores scores and eventgate port update
* 22:54 robh: cp3050 & cp3051 returned to service via [[phab:T243167|T243167]]
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:49 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set wgServer to protocol-relative for Wikitech and Test Wikitech (duration: 01m 05s)
* 22:37 robh: taking cp3050 & cp3051 offline for firmware update via [[phab:T243167|T243167]]
* 22:23 mutante: phabricator - upgrading PHP packages
* 22:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw231([0-6]).codfw.wmnet
* 22:12 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw231([0-6]).codfw.wmnet
* 22:11 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:10 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(6[4-9]{{!}}7[0-3]{{!}}84).eqiad.wmnet
* 22:08 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2314.codfw.wmnet
* 21:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:54 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:52 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:48 bblack: all authdns servers - upgrade to gdnsd-3.2.2
* 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:31 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:29 rzl@cumin1001: START - Cookbook sre.hosts.downtime
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:55 eileen: civicrm revision changed from {{Gerrit|52c68911c6}} to {{Gerrit|a6b222c19f}}, config revision is {{Gerrit|561ae21f77}}
* 20:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:13 rzl@cumin1001: conftool action : set/weight=30; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:12 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 06s)
* 20:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib: Fix stastd metric for StatsdMissRecordingSimpleCache (wb_terms work) (duration: 01m 05s)
* 20:05 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.20 (duration: 01m 03s)
* 20:04 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.20
* 20:02 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 20:02 rzl@cumin1001: conftool action : set/weight=10; selector: name=mw13(5[6-9]{{!}}6[0-2]).eqiad.wmnet
* 19:54 rlazarus: scap pull on new api servers mw13[56-62]
* 19:50 mutante: generating mcrouter certs for new codfw mw appservers
* 19:39 mutante: initial puppet run on new hosts mw231*
* 19:31 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:27 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/skins/MinervaNeue/includes/MinervaHooks.php: [[phab:T245162|T245162]] Check title value before proceeding to check if user page (duration: 01m 04s)
* 19:21 jforrester@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: [[phab:T244577|T244577]] [metawiki] Disable MobileFrontend mainpage special casing (duration: 01m 04s)
* 19:18 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244369|T244369]] [trwiki] Enable the WikidataPageBanner extension (duration: 01m 05s)
* 19:11 jforrester@deploy1001: Synchronized php-1.35.0-wmf.20/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: [[phab:T245570|T245570]] resourceloader: fix SqlDependencyModuleStore::setMulti() to use upsert() (duration: 01m 01s)
* 18:45 bblack: dns4001 - upgraded to gdnsd-3.2.2
* 18:44 bblack: reprepro: upload gdnsd 3.2.2-1~wmf1 to buster-wikimedia
* 18:39 mutante: mwmaint1002 - sudo systemctl reset-failed to clear systemd alerts
* 18:38 mutante: mwmaint1002 - removing Icinga ACK for systemd state - comments for it were from HHVM removal in Oct 2019
* 18:26 mutante: phab2001 - upgraded ssh-server, kept locally modified config; apt autoremove removes python3-debconf
* 18:23 mutante: phab2001 - installing package upgrades, incl. openssh, PHP version
* 18:22 mutante: phab2001 - upgrading mariadb client package versions
* 18:19 mutante: removing problem ACK from Icinga alerts for wikitech-static MediaWiki version. comments were about things in 2019
* 17:48 robh: cp1089 cp1090 returned to service via [[phab:T243167|T243167]]
* 17:40 jynus: starting data check between db1078 and db1140:3313 [[phab:T244958|T244958]]
* 17:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (just incase of cache issue) (duration: 01m 04s)
* 17:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Start reading for the new term store for clients up to Q4000 ([[phab:T225057|T225057]]) (duration: 01m 01s)
* 17:14 ema: cp4026: repool after probe Connection:keep-alive experiment revert https://gerrit.wikimedia.org/r/573337
* 17:12 robh: cp1088 returned to service, cp1089 & cp1090 offline for firmware update via [[phab:T243167|T243167]]
* 16:44 papaul: replacing ps1-a8-codfw mgmt in rack A8 will go down
* 16:37 otto@deploy1001: Finished deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist (duration: 12m 27s)
* 16:32 ema: depool cp4026, 5xx
* 16:24 otto@deploy1001: Started deploy [analytics/refinery@e23918a]: Updating eventgate-analytics port ([[phab:T245203|T245203]]) and also eventlogging whitelist
* 16:13 marostegui: Depool labsdb1011 to help replication to catch up
* 16:05 elukey: Update analytics-in4 filter term eventgate for [[phab:T245203|T245203]] on cr1/cr2 eqiad
* 15:48 ariel@deploy1001: Finished deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests (duration: 00m 03s)
* 15:48 ariel@deploy1001: Started deploy [dumps/dumps@b42acb5]: fix temp stub generation, add pagerangeinfo cache, some unit tests
* 14:59 marostegui: Stop mysql on es2021 - [[phab:T243052|T243052]]
* 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 14:29 marostegui: Data checksum on db1084 [[phab:T245621|T245621]]
* 14:07 marostegui: Upgrade and reboot db1084 - [[phab:T245621|T245621]]
* 14:02 marostegui: Start mysql on db1084 without replication - [[phab:T245621|T245621]]
* 13:53 jbond42: disable puppet to upgrade postgresql
* 13:30 jynus@cumin1001: dbctl commit (dc=all): 'Depool db1084, lots of connection errors', diff saved to https://phabricator.wikimedia.org/P10458 and previous config saved to /var/cache/conftool/dbconfig/20200219-133057-jynus.json
* 12:25 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]], take II, the cache issue (duration: 01m 04s)
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:573236{{!}}Start reading for the new term store for clients up to Q2000 (T225057)]] (duration: 01m 06s)
* 11:56 volans: better splay of periodic scripts that interact with Netbox - [[phab:T244291|T244291]]
* 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 11:08 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.20/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 04s)
* 11:06 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase/lib/includes/Store: Get rid of useless metrics in EntityTermLookupBase ([[phab:T245592|T245592]]) (duration: 01m 12s)
* 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:45 jynus: upgrading mariadb client on cumin hosts
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3315, db2089:3316 after new package testing', diff saved to https://phabricator.wikimedia.org/P10457 and previous config saved to /var/cache/conftool/dbconfig/20200219-103806-marostegui.json
* 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 10:17 jynus: stopping db2089 mariadb@s5
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw135[0-5]*.eqiad.wmnet
* 10:12 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw135[0-5]*.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1349.eqiad.wmnet
* 10:11 jiji@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=apache2,name=mw1349.eqiad.wmnet
* 10:09 moritzm: updated tftpboot environment for stretch-bootif for the 9.12 point release [[phab:T241359|T241359]]
* 09:53 jynus: stopping and upgrading db1140 instances
* 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3315, db2089:3316 for new package testing', diff saved to https://phabricator.wikimedia.org/P10455 and previous config saved to /var/cache/conftool/dbconfig/20200219-095139-marostegui.json
* 09:51 marostegui: Depool db2089:3315, db2089:3316 for new package testing
* 09:49 akosiaris: [[phab:T245516|T245516]]. Deploy mathoid chart version 0.0.27, removing logstash gelf configuration
* 09:46 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'mathoid' for release 'production' .
* 09:43 vgutierrez: test trafficserver 8.0.6-rc1 in cp40[26,32]
* 09:34 _joe_: cleared opcache on mw1313
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'canary' .
* 09:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:33 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'mathoid' for release 'staging' .
* 08:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 08:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:50 marostegui: Remove dbproxy1007 grants from m2 - [[phab:T231280|T231280]]
* 08:41 marostegui: Remove wikiadmin2 user from s7 - [[phab:T243512|T243512]]
* 08:23 Urbanecm: run mwscript deleteEqualMessages.php cswiki --delete
* 08:14 godog: roll restart swift proxies - [[phab:T244776|T244776]]
* 07:02 marostegui: Remove wikiadmin2 user from es2 - [[phab:T243512|T243512]]
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10454 and previous config saved to /var/cache/conftool/dbconfig/20200219-065726-marostegui.json
* 06:35 marostegui: Compress watchlist_expiry table on s3 (this will take hours as I have left a 60 seconds sleep between tables) - [[phab:T245358|T245358]]
* 06:17 marostegui: Compress new and empty watchlist_expiry table - [[phab:T245358|T245358]]
* 01:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:24 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 01:22 mutante: mw1353 - restarted apache (some race condition on new installs, 5 other servers did not have the issue)
* 01:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1350.eqiad.wmnet
* 01:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1351.eqiad.wmnet
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1352.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1355.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1354.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1350.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1353.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1351.eqiad.wmnet
* 01:14 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1352.eqiad.wmnet
* 01:03 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:01 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T240728|T240728]] Fix Latin Wikipedia (VICIPÆDIA) wordmark and set size correctly (duration: 01m 06s)
* 01:01 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:45 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:43 James_F: Manually purged https://en.wikipedia.org/images/mobile/copyright/wikipedia-wordmark-la.svg and .png from Varnish for [[phab:T240728|T240728]]
* 00:41 jforrester@deploy1001: Synchronized static/images/mobile/copyright/: [[phab:T240728|T240728]] Sync logo images (duration: 01m 04s)
* 00:40 mutante: mw1351 through mw1355 - initial puppet runs - new appservers
* 00:36 niharika29@deploy1001: Synchronized static/images/mobile/copyright/: Remove unnecessary id from wordmark (duration: 01m 03s)
* 00:34 niharika29@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Adjust MT Threshold for Assamese to 70% - [[phab:T245509|T245509]] (duration: 01m 04s)
* 00:24 niharika29@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikimediaEvents/: Follow up on authevents statsd changes in {{Gerrit|I7612b68fe}} (duration: 01m 03s)
* 00:21 niharika29@deploy1001: Synchronized wmf-config/logging.php: Update authmanager-statsd channel name (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|8c77e9e915}} to {{Gerrit|52c68911c6}}, config revision is {{Gerrit|561ae21f77}}
* 00:10 niharika29@deploy1001: Synchronized wmf-config/logging.php: Make the logstash and authmanager-statsd Monolog handlers compatible (duration: 01m 04s)
* 00:08 mutante: creating mcrouter certs for mw1350


== 2020-02-18 ==
== 2021-07-23 ==
* 23:56 mutante: mw1349 - scap pull
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:54 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1349.eqiad.wmnet
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:34 maryum: running reindex on mwmaint1002 - [[phab:T194448|T194448]]
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:28 maryum: running reindex for wikimedia wikis
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 23:14 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2151.wmnet
* 16:15 effie: enable puppet on mc-gp* hosts
* 23:12 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw2150.wmnet
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 23:12 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:58 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (extra sync for [[phab:T236104|T236104]]) (duration: 01m 04s)
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:54 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Enable ores_articletopics field creation for all wikis (duration: 01m 03s)
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 22:52 chaomodus: completed upgrading Netbox to 2.7.4 [[phab:T244291|T244291]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 22:51 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3) (duration: 00m 11s)
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 22:51 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part3)
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 22:49 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2) (duration: 01m 19s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 22:48 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (part2)
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 22:46 crusnov@deploy1001: Finished deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]] (duration: 01m 19s)
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 22:45 crusnov@deploy1001: Started deploy [netbox/deploy@f3d56dd]: netbox 2.7.4 upgrade [[phab:T244291|T244291]]
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 22:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244185|T244185]] Raise minimum log level for 'OAuth' from DEBUG to INFO (duration: 01m 04s)
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 22:30 chaomodus: Upgrading Netbox to 2.7.4
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 21:56 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 21:54 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 21:26 XioNoX: rollback tcp-mss clamping in eqiad/eqord
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T243536|T243536]]
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 21:07 jeh: power down and set incinga downtime on cloudvirt1022 [[phab:T241884|T241884]]
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 20:54 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on metawiki - [[phab:T242122|T242122]] (duration: 01m 03s)
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 20:47 ppchelko@deploy1001: Finished deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]] (duration: 07m 59s)
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 20:45 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enabling EventStreamConfig extension on testwiki - [[phab:T242122|T242122]] (duration: 01m 04s)
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 20:39 ppchelko@deploy1001: Started deploy [changeprop/deploy@e2fe8ca]: respect service name in consumer group [[phab:T244387|T244387]]
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 20:06 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/libs/StatusValue.php: [[phab:T245155|T245155]] StatusValue: Fix __toString() to not choke on special parameters (duration: 01m 04s)
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 20:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.20 [[phab:T233868|T233868]]
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 19:52 jforrester@deploy1001: Finished scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]] (duration: 61m 01s)
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 19:41 papaul: shutting down dns2001 for 10G card troubleshooting
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 19:30 James_F: Running `foreachwiki sql.php php-1.35.0-wmf.19/maintenance/archives/patch-watchlist_expiry.sql` for [[phab:T244631|T244631]]
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 18:51 jforrester@deploy1001: Started scap: testwiki to 1.35.0-wmf.20 and re-build l10n cache [[phab:T233868|T233868]]
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 18:49 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.18 (duration: 15m 29s)
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 18:25 James_F: Running `scap prep` for 1.35.0-wmf.20 ref. [[phab:T233868|T233868]]
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 18:01 James_F: 1.35.0-wmf.20 was branched at {{Gerrit|c664b4f1b933d110bd69f074c399695bd6b17d13}} for [[phab:T233868|T233868]]
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 18:01 marxarelli: completed promotion of 1.35.0-wmf.19 to all wikis ([[phab:T233867|T233867]])
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 17:52 dduvall@deploy1001: rebuilt and synchronized wikiversions files: Re-roll all wikis to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 17:47 marxarelli: re-rolling wmf.19 to all wikis ([[phab:T233867|T233867]]) with eyes particularly on ([[phab:T245202|T245202]])
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 17:28 bblack: cp3 (esams edge) - revert GRE MTU mitigations - [[phab:T232602|T232602]]
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 17:00 papaul: restting ps1-a8-codfw see [[phab:T245164|T245164]]
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 16:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 16:12 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:11 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:09 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:08 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 16:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'production' .
* 16:02 ottomata: deploying new 'canary' and 'production' releases for eventgate-main.Β  (These releases use a new nodePort, and so will not be active until LVS is modified.Β  The old 'main' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 16:02 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'canary' .
* 15:51 bblack: dns2001 - shutdown for hw/reimage work - [[phab:T242017|T242017]]
* 15:47 bblack: dns2001 - stopping bgp to drain service for hw/reimage work - [[phab:T242017|T242017]]
* 15:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:40 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:36 jynus: stopping db1140:s3 instance
* 15:35 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:34 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:14 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:08 vgutierrez@puppetmaster1001: conftool action : set/weight=100; selector: dc=eqiad,cluster=cache_text,service=ats-be,name=cp1089.eqiad.wmnet
* 15:04 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:56 bblack: esams repooled in DNS
* 14:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:54 ottomata: deploying new 'canary' and 'production' releases for eventgate-analytics.Β  (These releases use a new nodePort, and so will not be active until LVS is modified.Β  The old 'analytics' release and nodePort is left as is.) - [[phab:T242861|T242861]]
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 14:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 14:39 XioNoX: remove cr2-esams VRRP handicap - [[phab:T243080|T243080]]
* 14:34 XioNoX: restore default esams-eqiad link cost - [[phab:T243080|T243080]]
* 14:33 XioNoX: re-enable cr2-esams BGP transit/peering - [[phab:T243080|T243080]]
* 14:31 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 14:29 XioNoX: re-disable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 14:14 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/DiscussionTools: [[gerrit:572882{{!}}wmf.18: Add config option and query parameter to control loading]] (duration: 01m 11s)
* 14:02 cdanis: depool esams
* 14:01 XioNoX: re-enable cr2-esams BGP group IX4 - [[phab:T243080|T243080]]
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 25 -> 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10448 and previous config saved to /var/cache/conftool/dbconfig/20200218-135525-marostegui.json
* 13:44 XioNoX: installing OS on cr2-esams:re0 - [[phab:T243080|T243080]]
* 13:39 XioNoX: cr2-esams - request chassis routing-engine master switch - [[phab:T243080|T243080]]
* 13:37 XioNoX: deactivate peering/transit on cr2-esams - [[phab:T243080|T243080]]
* 13:24 XioNoX: reboot cr2-esams:re1 (backup) - [[phab:T243080|T243080]]
* 13:23 XioNoX: bump cost of eqiad-esams transport - [[phab:T243080|T243080]]
* 13:10 XioNoX: fail vrrp master to cr3-esams - [[phab:T243080|T243080]]
* 12:58 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 12:55 Amir1: EU SWAT done
* 12:53 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]], take II (duration: 01m 03s)
* 12:52 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572731{{!}}Add DiscussionTools to four wikis in hidden mode (T244870)]] (duration: 01m 04s)
* 12:45 XioNoX: remove graceful-switchover and nonstop-routing from cr2-esams - [[phab:T243080|T243080]]
* 12:36 XioNoX: push new Junos to cr2-esams:re1 (backup RE, noop) - [[phab:T243080|T243080]]
* 12:22 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part II (duration: 01m 03s)
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I, take II (the cache issue) (duration: 01m 04s)
* 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:569031{{!}}Wikibase: added config variables to configure entity sources (T242087)]], Part I (duration: 01m 06s)
* 12:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:572628{{!}}Start reading for the new term store for clients up to Q1000 (T225057)]] (duration: 01m 05s)
* 12:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|4b193dd}}: Increase Commons linkpurge rate limit for patrollers ([[phab:T245214|T245214]]) (duration: 01m 31s)
* 11:51 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:48 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:47 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:43 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:41 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
* 11:35 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
* 11:27 jynus: reenabling prometheus exporter metadata user for prometheus1003
* 11:10 jynus: temp. disabling prometheus exporter metadata user for prometheus1003
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Increase API weight for db1107 15 -> 25 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10445 and previous config saved to /var/cache/conftool/dbconfig/20200218-104958-marostegui.json
* 09:27 gehel: re-enable puppet on mw* - [[phab:T222321|T222321]]
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10444 and previous config saved to /var/cache/conftool/dbconfig/20200218-091343-marostegui.json
* 09:09 gehel: disabling puppet on mw* to deploy apache config change - [[phab:T222321|T222321]]
* 09:07 volans: rm /var/log/exim4/paniclog on cumin1001 to clear OOM from last week error
* 08:59 marostegui: Remove wikiadmin2 grants from es1 [[phab:T243512|T243512]]
* 08:59 marostegui: Remove wikiadmin2 grants from es1
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options', diff saved to https://phabricator.wikimedia.org/P10443 and previous config saved to /var/cache/conftool/dbconfig/20200218-085713-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10442 and previous config saved to /var/cache/conftool/dbconfig/20200218-082306-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107 after temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10441 and previous config saved to /var/cache/conftool/dbconfig/20200218-080952-marostegui.json
* 08:08 marostegui: Restart MySQL to pick up optimizer_switch changes - [[phab:T245489|T245489]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to temporary change optimizer options - [[phab:T245489|T245489]]', diff saved to https://phabricator.wikimedia.org/P10440 and previous config saved to /var/cache/conftool/dbconfig/20200218-080623-marostegui.json
* 07:34 elukey: powercycle analytics1065 (crashed hours ago, no mgmt console available, no ssh)
* 06:39 marostegui: Remove wikiadmin2 from pc1007, pc1008, pc1009 and pc1010 [[phab:T243512|T243512]]
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 100 -> 200 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10439 and previous config saved to /var/cache/conftool/dbconfig/20200218-063819-marostegui.json
* 06:27 marostegui: Stop haproxy on dbproxy1007 - [[phab:T245385|T245385]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 and weight 10 in API for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10438 and previous config saved to /var/cache/conftool/dbconfig/20200218-062459-marostegui.json
* 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.decommission


== 2020-02-17 ==
== 2021-07-22 ==
* 19:56 cdanis: finish enabling TCP-MSS clamping in eqiad
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 19:49 cdanis: s/no-op//
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 19:49 cdanis: no-op enable TCP-MSS clamping on eqord and eqiad
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:33 cdanis: no-op enable flowspec change on cr2-eqord and cr2-eqiad
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 18:25 elukey: restart kafka on kafka-jumbo1001 to pick up new openjdk updates
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 17:25 bblack: GRE MTU mitigations applied to esams cp hosts only - [[phab:T232602|T232602]]
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 15:50 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 15:48 ayounsi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 15:48 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 15:44 cdanis: βœ”οΈ cdanis@icinga1001.wikimedia.org ~ πŸ•₯β˜• sudo systemctl restart ircecho
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10422 and previous config saved to /var/cache/conftool/dbconfig/20200217-143146-marostegui.json
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 14:17 ema: reprepro includedeb buster-wikimedia ~ema/cadvisor_0.35.0+ds1-4_amd64.deb [[phab:T183146|T183146]]
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 12:34 XioNoX: add test flowspec rules to cr3-knams
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 12:34 moritzm: installing postgresql-9.4 security updates
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 12:27 vgutierrez: reboot acmechief instances (kernel upgrade)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 10:31 jynus: dropping all databases from db1140:3313
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 10:22 marostegui@cumin1001: dbctl commit (dc=all): ' db1107 increase API weight from 10 to 15 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10420 and previous config saved to /var/cache/conftool/dbconfig/20200217-102218-marostegui.json
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 10:20 vgutierrez: rolling restart of ats-tls and varnish-fe on ulsfo to enable KA between them - [[phab:T244464|T244464]]
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 10:00 moritzm: installing Linux 4.9.210 kernels on stretch systems
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 09:10 godog: correction, +100G
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 09:09 godog: +10G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 09:06 godog: +50G to prometheus/ops fs on prometheus eqiad - [[phab:T245361|T245361]]
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 07:22 marostegui: Stop haproxy on dbproxy1002 - [[phab:T245384|T245384]]
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010Β  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-02-15 ==
== 2021-07-21 ==
* 01:01 cdanis: βœ”οΈ cdanis@an-coord1001.eqiad.wmnet ~ πŸ•—πŸΊ sudo systemctl restart hive-server2.service ; sudo systemctl restart hive-metastore.service
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-02-14 ==
== 2021-07-20 ==
* 23:42 XenoRyet: updated civicrm from {{Gerrit|cf86495d44}} to {{Gerrit|8c77e9e915}}
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 21:01 volker-e@deploy1001: Finished deploy [design/style-guide@1928c00]: Deploy design/style-guide:Β  (duration: 00m 09s)
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 21:01 volker-e@deploy1001: Started deploy [design/style-guide@1928c00]: Deploy design/style-guide:
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 20:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent some logspam [[phab:T245280|T245280]] (duration: 01m 05s)
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 19:27 XenoRyet: updated civicrm from {{Gerrit|55b2afb6eb}} to {{Gerrit|cf86495d44}}
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 19:10 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase: [[phab:T245062|T245062]] Prevent invalid term languages from cached PrefetchingTermLookup (duration: 01m 09s)
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:37 jforrester@deploy1001: Unlocked for deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (duration: 03m 05s)
* 17:06 rzl: enabled puppet on A:mw
* 17:33 jforrester@deploy1001: Locking from deployment [ALL REPOSITORIES]: Testing [[phab:T245062|T245062]] fix on mwdebug1001 (planned duration: 60m 00s)
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:11 moritzm: installing git-lfs updates from Buster 10.3 point update
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 15:55 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb10u1 to apt.wikimedia.org
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 15:55 bblack: (log(n))
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10414 and previous config saved to /var/cache/conftool/dbconfig/20200214-155443-marostegui.json
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 15:52 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb9u1 to apt.wikimedia.org
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Resync initialisesetting to try and pick up previoiusly deployed cirrus query routing changes (duration: 01m 05s)
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 15:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 15:32 effie: restart mc-gp* for updates
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 15:17 bd808: Toil reduction: !log messages now work from the SRE team's Freenode channel.
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 13:50 gehel: restart relforge for JVM upgrade - [[phab:T245120|T245120]]
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 10:35 vgutierrez: revert ats 8.0.6-rc0 experiment on cp40[26,32]
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 10:14 vgutierrez: rolling restart of ats-be to enable TLSv1.3 against origin servers - [[phab:T170567|T170567]]
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10409 and previous config saved to /var/cache/conftool/dbconfig/20200214-093456-marostegui.json
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 09:25 volans: manually absented /usr/local/bin/apt2xml on the 5 hosts with puppet disabled
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 09:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 08:46 moritzm: installing 4.19.98 kernel update on Buster systems
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10408 and previous config saved to /var/cache/conftool/dbconfig/20200214-080600-marostegui.json
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 06:51 vgutierrez: updating puppet compiler facts
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 01:27 dpifke@deploy1001: Finished deploy [performance/navtiming@2eec00a]: (no justification provided) (duration: 00m 05s)
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 01:27 dpifke@deploy1001: Started deploy [performance/navtiming@2eec00a]: (no justification provided)
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T245202|T245202]] cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: [[phab:T245202|T245202]] cirrus: Increase the pool counter limits a bit (duration: 01m 05s)
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-02-13 ==
== 2021-07-19 ==
* 22:13 jeh: running filesystem tests on cloudvirt1024 [[phab:T241884|T241884]]
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:40 jbond42: refresh facts on compilers
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying the 'logging-external' release) (safe because eventgate-logging-external is not in use)Β  - [[phab:T245203|T245203]]
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 21:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 20:33 marxarelli: rollback to group1 due to 500 spike (2k/min) ([[phab:T233867|T233867]])
* 18:46 brennen: gerrit1001: restarting gerrit
* 20:32 dduvall@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
* 18:40 vgutierrez: stop pybal on lvs2009Β  - [[phab:T286921|T286921]]
* 20:30 marxarelli: varnish 500 spike. rolling back
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 20:20 gehel: restarting blazegraph + updater on wdqs2006
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.19
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 19:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/api/ApiRollback.php: [[phab:T245159|T245159]] ApiRollback: Properly deal with UserIdentity (duration: 01m 04s)
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 19:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/resourceloader/ResourceLoaderSkinModule.php: [[phab:T245182|T245182]] ResourceLoaderSkinModule: Don't hard-deprecate wgLogoHD just now (duration: 01m 03s)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 19:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T219534|T219534]] Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 19:10 moritzm: installing e2fsprogs security updates
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:48 bblack: ns1.wikimedia.org - re-routing back to authdns2001 instead of dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:38 bblack: authdns2001 - reboot - [[phab:T242017|T242017]]
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:36 bblack: ns1.wikimedia.org - re-routing from authdns2001 to dns2002 on cr[12]-codfw - [[phab:T242017|T242017]]
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 18:09 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I9d0c8af3c577}} (duration: 01m 06s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 18:00 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iae1f45896}} (duration: 01m 06s)
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:59 volans: downtimed mgmt in eqiad for 1h
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iae1f45896}} (duration: 01m 08s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:49 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Ibfca686f681}} (duration: 01m 06s)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:41 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|Iefff596955e}} (duration: 01m 08s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:40 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Iefff596955e}} (duration: 01m 06s)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:35 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:32 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I2e4fb0c086de0f8ac}} (duration: 01m 06s)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op (code style only) deploy sync (duration: 01m 07s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:09 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php No-op (code style only) deploy sync (duration: 00m 04s)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 17:09 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php No-op (code style only) deploy sync
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 16:32 robh: ps1-a8-codfw.mgmt.codfw.wmnet firmware upgraded via [[phab:T245164|T245164]]
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 16:28 papaul: rebooting elastic2043 for firmware upgrade
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 16:22 gehel: canceled the restart of elastic2043 - [[phab:T243715|T243715]]
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 16:21 gehel: restarting elastic2043 - [[phab:T243715|T243715]]
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 16:10 _joe_: depooling/repooling mw1240
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 16:02 _joe_: pooled mw1238 again
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 15:59 _joe_: depooling mw1238 for analysis
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 15:42 vgutierrez: rolling restart of ats-be on esams - [[phab:T170567|T170567]]
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 15:38 vgutierrez: disable allow_half_open on ats-tls @ cp4031 - [[phab:T236458|T236458]]
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 15:27 vgutierrez: turning on TLSv1.3 between ats-be and applayer in cp30[51-52] - [[phab:T170567|T170567]]
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 15:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikibaseMediaInfo/resources/: UBN fix: Force non-value to be undefined (duration: 01m 06s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 14:51 vgutierrez: test TLSv1.3 between ats-be and applayer in cp3050 - [[phab:T170567|T170567]]
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:47 XioNoX: re-image rpki2001 - [[phab:T244585|T244585]]
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 XioNoX: add routinator_0.6.4_amd64.deb to buster-wikimedia apt repo
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10405 and previous config saved to /var/cache/conftool/dbconfig/20200213-142735-marostegui.json
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:24 XioNoX: re-enable ping offload in esams - [[phab:T244584|T244584]]
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 XioNoX: disable ping offload in esams - [[phab:T244584|T244584]]
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:24 XioNoX: re-enable ping offload in eqiad - [[phab:T244584|T244584]]
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 13:06 XioNoX: disable ping offload in eqiad - [[phab:T244584|T244584]]
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 13:03 XioNoX: re-enable ping offload in codfw - [[phab:T244584|T244584]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 13:00 vgutierrez: pool cp10[75,76] running buster - [[phab:T242093|T242093]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 12:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 12:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 12:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 12:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 12:34 Amir1: EU SWAT is done
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]], take II, the cache issue ([[phab:T219123|T219123]] [[phab:T225055|T225055]]) (duration: 01m 03s)
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571961{{!}}Read and write more in the new term store]] (duration: 01m 03s)
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 12:29 vgutierrez: depool cp10[75,76] and reimage as buster - [[phab:T242093|T242093]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:28 vgutierrez: pool cp10[77,78] running buster - [[phab:T242093|T242093]]
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571956{{!}}Revert: Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:18 XioNoX: re-image ping2001 to buster - [[phab:T244584|T244584]]
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|1c81925}}: Create Test Custodians group at Beta Wikiversity ([[phab:T240438|T240438]]) (duration: 01m 07s)
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:13 XioNoX: disable ping offload in codfw
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 12:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 12:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|0f035e4}}: Update wgAvailableRights declaration of autoreviewprotected ([[phab:T230103|T230103]]) (duration: 01m 03s)
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: {{Gerrit|176b0e8}}: Grant autopatrol to azwiki patrollers ([[phab:T244338|T244338]]) (duration: 01m 05s)
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 11:53 vgutierrez: depool cp10[77,78] and reimage as buster - [[phab:T242093|T242093]]
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 11:52 vgutierrez: pool cp10[79,80] running buster - [[phab:T242093|T242093]]
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 11:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 11:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 11:18 vgutierrez: rolling upgrade of ATS to version 8.0.5-1wm16 fleet wide - [[phab:T244464|T244464]]
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 11:16 vgutierrez: depool cp10[79,80] and reimage as buster - [[phab:T242093|T242093]]
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 11:12 ema: A:cp re-enable puppet, leave it to cron to apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 11:08 vgutierrez: upload trafficserver 8.0.5-1wm16 to apt.wm.o (buster) - [[phab:T244464|T244464]]
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 11:02 vgutierrez: pool cp10[81,82] and reimage as buster - [[phab:T242093|T242093]]
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:59 ema: cp4021 (cache_upload): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 10:49 ema: cp4027 (cache_text): apply wikimedia-common/wikimedia-frontend VCL merge [[phab:T241239|T241239]]
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 godog: +100G to prometheus/ops in codfw
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 10:23 vgutierrez: removing /root/.ssh/known_hosts in cumin1001
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 10:21 vgutierrez: pool cp10[83,84] running buster - [[phab:T242093|T242093]]
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 10:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 09:45 vgutierrez: depool cp10[83,84] and reimage as buster - [[phab:T242093|T242093]]
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 09:45 vgutierrez: pool cp10[85,86] running buster - [[phab:T242093|T242093]]
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 09:10 moritzm: installing Java security updates on elastic* and relforge*
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 50 -> 100 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10403 and previous config saved to /var/cache/conftool/dbconfig/20200213-085957-marostegui.json
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 08:57 gehel: restart elasticsearch on elastic2051 - JVM upgrade
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 08:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 08:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 07:57 moritzm: installing Java security updates on Hadoop, Kafka/Jumbo, AQS and Druid canaries
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 07:57 vgutierrez: depool cp10[85,86] and reimage as buster - [[phab:T242093|T242093]]
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 07:53 moritzm: rolling restart of restbase-dev to pick up Java security update
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 07:49 vgutierrez: pool cp10[87,88] running buster - [[phab:T242093|T242093]]
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:49 vgutierrez: testing ATS 8.0.5-1wm16 + KA between ats-tls and varnish-fe in cp4031 - [[phab:T244464|T244464]]
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 07:47 moritzm: installing Java security updates on stat/SWAP hosts
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 50 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10402 and previous config saved to /var/cache/conftool/dbconfig/20200213-072839-marostegui.json
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 07:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 07:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 07:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 07:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 07:03 vgutierrez: depool cp10[87,88] and reimage as buster - [[phab:T242093|T242093]]
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 07:02 vgutierrez: pool cp10[89,90] running buster - [[phab:T242093|T242093]]
* 11:40 moritzm: installing bluez security updates
* 06:49 vgutierrez: pool cp20[02,05] running buster - [[phab:T242093|T242093]]
* 11:31 Lucas_WMDE: EU backport+config window done
* 06:36 marostegui: Upgrade and compress db1087, this will generate lag on s8 on the wiki replicas - [[phab:T232446|T232446]]
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression - [[phab:T232446|T232446]]', diff saved to https://phabricator.wikimedia.org/P10401 and previous config saved to /var/cache/conftool/dbconfig/20200213-063535-marostegui.json
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 06:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1099:3318 into vslow for s8 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10400 and previous config saved to /var/cache/conftool/dbconfig/20200213-063334-marostegui.json
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 06:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10399 and previous config saved to /var/cache/conftool/dbconfig/20200213-063207-marostegui.json
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 06:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:15 vgutierrez: depool codfw text traffic
* 06:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10398 and previous config saved to /var/cache/conftool/dbconfig/20200213-062642-marostegui.json
* 03:26 twentyafterfour: restarted phd on phab1001
* 06:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:25 twentyafterfour: investigating PHD failure
* 06:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10397 and previous config saved to /var/cache/conftool/dbconfig/20200213-062148-marostegui.json
* 06:19 vgutierrez: testing a new build of ATS 8.0.6 in cp40[26,32]
* 06:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10396 and previous config saved to /var/cache/conftool/dbconfig/20200213-061219-marostegui.json
* 06:11 vgutierrez: depool cp10[89,90] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: depool cp20[02,05] and reimage as buster - [[phab:T242093|T242093]]
* 06:04 vgutierrez: pool cp20[01,08] running buster - [[phab:T242093|T242093]]
* 06:02 twentyafterfour: set phabricator read-only to false
* 06:01 twentyafterfour: set phabricator read-only
* 06:00 marostegui: Start phabricator maintenance [[phab:T244566|T244566]]
* 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:53 marostegui: Upgrade db1128 without restarting mysql - [[phab:T244566|T244566]]
* 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:47 marostegui: Silence m3 hosts for maintenance - [[phab:T244566|T244566]]
* 05:38 vgutierrez: depool cp2008 and reimage as buster - [[phab:T242093|T242093]]
* 05:37 vgutierrez: pool cp2011 running buster - [[phab:T242093|T242093]]
* 05:35 vgutierrez: depool cp2001 and reimage as buster - [[phab:T242093|T242093]]
* 05:34 vgutierrez: pool cp2004 running buster - [[phab:T242093|T242093]]
* 05:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 05:09 vgutierrez: depool cp20[04,11] and reimage as buster - [[phab:T242093|T242093]]
* 03:57 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:57 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:32 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 03:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 03:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 01:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 01:10 twentyafterfour: no apparent problems with phabricator upgrade, all done
* 01:01 twentyafterfour: starting phabricator deploy, momentary downtime expected while apache restarts
* 00:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:45 niharika29@deploy1001: Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - [[phab:T244488|T244488]] (duration: 01m 07s)
* 00:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 00:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime


== 2020-02-12 ==
== 2021-07-16 ==
* 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 21:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 21:53 chaomodus: restart nagios-nrpe-service on cumin1001 after it had oomed
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:48 vgutierrez: restart pybal on lvs2010
* 21:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 21:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 21:10 marxarelli: completed group1 to 1.35.0-wmf.19
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 21:00 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.19 (duration: 01m 03s)
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 20:59 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.19
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 20:49 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T232563|T232563]] - Remove SERVER_SOFTWARE override (duration: 01m 03s)
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 20:39 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T72470|T72470]] - Disable wgLegacyJavaScriptGlobals on svwiki (duration: 01m 08s)
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Don't use hex escapes in the name of cawiki (duration: 01m 04s)
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 19:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243503|T243503]] [itwiki] Move assignment of 'mover' group from sysops to bureaucrats (duration: 01m 02s)
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 19:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T243509|T243509]] [zh_classicalwiki] Enable new user message for auto-created accounts (duration: 01m 03s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 19:38 James_F: Ran mwscript maintenance/namespaceDupes.php --wiki=mywiki --fix and mwscript maintenance/namespaceDupes.php --wiki=mywiktionary --fix on mwmaint1002
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 19:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244980|T244980]] Localise $wgMetaNamespace for mywiki and mywiktionary (duration: 01m 03s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 19:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T244205|T244205]] [newiki] Set local timezone to Kathmandu (duration: 01m 03s)
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T241883|T241883]] [fywiktionary] Set a local wgSitename (duration: 01m 03s)
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 19:12 jforrester@deploy1001: Synchronized wmf-config/throttle-analyze.php: Replace deprecated IP class with IPUtils (no-op sync) (duration: 01m 03s)
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:31 mutante: irc2001 - manually run the "$<nowiki>{</nowiki>v6_token_cmd<nowiki>}</nowiki> && $<nowiki>{</nowiki>v6_flush_dyn_cmd<nowiki>}</nowiki>" commands from interface::add_ip6_mapped to debug 'Interface::Add_ip6_mapped[main]/Augeas[ens5_v6_token]: Could not evaluate: Saving failed' but it does not reproduce the puppet error ... ([[phab:T244719|T244719]])
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 17:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/pager/IndexPager.php: [[phab:T244941|T244941]] IndexPager: Cast properties passed to implode to arrays (duration: 01m 03s)
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 17:27 jeh: upgrade RAID firmware on cloudvirt1024 to 25.5.6.0009 [[phab:T241884|T241884]]
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 17:22 bblack: ns1.wikimedia.org - re-route back to original authdns2001 destination
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 17:11 brennen: restarting jenkins for updates
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 17:09 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 17:01 vgutierrez: rolling back cp4026 and cp4032 to trafficserver 8.0.5-1wm15
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 17:00 vgutierrez: depool cp40[26,32]
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 16:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:52 vgutierrez: pool cp20[06,14] running buster - [[phab:T242093|T242093]]
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 16:51 bblack@cumin1001: START - Cookbook sre.hosts.downtime
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 16:49 moritzm: installing openjpeg2 security updates
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 16:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 15:56 vgutierrez: Enable KA and disable parent proxies on cp4031 - [[phab:T244464|T244464]]
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 15:50 vgutierrez: depool cp20[06,14] and reimage as buster - [[phab:T242093|T242093]]
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 15:49 volans: spicerack upgraded to 0.0.30-1 on both cumin hosts
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 15:48 vgutierrez: pool cp20[07,17] running buster - [[phab:T242093|T242093]]
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 15:46 bblack: authdns2001 - shutting down for hardware work - [[phab:T242017|T242017]]
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:39 jeh: clearing foreign drive RAID configuration on cloudvirt1024 [[phab:T241884|T241884]]
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga - [[phab:T244958|T244958]]
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:32 marostegui: Disable event handler for db1095 RAID check on icinga -
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 15:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 15:25 jeh: upgrade BIOS firmware on cloudvirt1024 to 2.4.8 [[phab:T241884|T241884]]
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 15:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:02 vgutierrez: depool cp20[07,17] and reimage as buster - [[phab:T242093|T242093]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:34 XioNoX: repool eqsin
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:31 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:00 vgutierrez: pool cp20[10,18] running buster - [[phab:T242093|T242093]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10393 and previous config saved to /var/cache/conftool/dbconfig/20200212-135514-marostegui.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 13:39 akosiaris: revert sessionstore on mw1331, mw1348 so that it times out instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:36 XioNoX: re-enable transit/peering on cr1-eqsin - [[phab:T244944|T244944]]
* 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:23 akosiaris: mangle sessionstore on mw1331, mw1348 so that it timesout instead of returning TCP RSTs. Testing for [[phab:T243106|T243106]]
* 13:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 XioNoX: cr1-eqsin RE failover (final) - [[phab:T244944|T244944]]
* 13:21 marostegui: Restart wikibugs as phab comments aren't showing up on irc - [[phab:T241109|T241109]]
* 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:18 jynus: setting up db1140 under maintenance (upgrade, reboot, disable alerts)
* 13:15 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 13:10 moritzm: upgrading debdeploy fleet-wide to 0.0.99.13
* 13:08 moritzm: uploaded libapache2-mod-auth-cas 1.2-1~deb8u1 for jessie-wikimedia to apt.wikimedia.org
* 13:05 vgutierrez: depool cp20[10,18] and reimage as buster - [[phab:T242093|T242093]]
* 13:05 vgutierrez: pool cp20[12,20] running buster - [[phab:T242093|T242093]]
* 12:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 XioNoX: cr1-eqsin RE failover - [[phab:T244944|T244944]]
* 12:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:35 vgutierrez: depool cp20[12,20] and reimage as buster - [[phab:T242093|T242093]]
* 12:34 vgutierrez: pool cp20[13,22] running buster - [[phab:T242093|T242093]]
* 12:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]], take II, the cache issue (duration: 01m 03s)
* 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571705{{!}}Triple the factor of WDQS lag to maxlag for Wikidata (T244722)]] (duration: 01m 04s)
* 12:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}571412{{!}}Enable ContentTranslation out of beta in bs and mk WPs (T244139, T244140)]] (duration: 01m 15s)
* 12:08 vgutierrez: depool cp2013 and reimage as buster - [[phab:T242093|T242093]]
* 12:06 vgutierrez: pool cp2016 running buster - [[phab:T242093|T242093]]
* 12:01 vgutierrez: depool cp20[16,22] and reimage as buster - [[phab:T242093|T242093]]
* 11:57 vgutierrez: pool cp20[19,24] running buster - [[phab:T242093|T242093]]
* 11:53 akosiaris: mangle sessionstore on mw1331 so that it is unreachable. Testing for [[phab:T243106|T243106]]
* 11:49 vgutierrez: repooling cp40[26,32]
* 11:39 vgutierrez: pool cp3050 running buster - [[phab:T242093|T242093]]
* 11:37 vgutierrez: depooling cp[4026,4032]
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 vgutierrez: depool cp2024 and reimage as buster - [[phab:T242093|T242093]]
* 11:17 vgutierrez: pool cp2025 running buster - [[phab:T242093|T242093]]
* 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:15 vgutierrez: depool cp2016 and reimage as buster - [[phab:T242093|T242093]]
* 11:14 vgutierrez: pool cp2019 running buster - [[phab:T242093|T242093]]
* 11:11 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
* 11:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 vgutierrez: depool cp3050 and reimage as buster - [[phab:T242093|T242093]]
* 10:49 vgutierrez: pool cp30[51,52] running buster - [[phab:T242093|T242093]]
* 10:45 vgutierrez: depool cp20[19,25] and reimage as buster - [[phab:T242093|T242093]]
* 10:42 vgutierrez: pool cp2026 running buster - [[phab:T242093|T242093]]
* 10:36 vgutierrez: pool cp2023 running buster - [[phab:T242093|T242093]]
* 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:34 moritzm: bouncing ferm on ganeti1016, failed to start after boot
* 10:32 vgutierrez: Enable KA between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 10:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:12 vgutierrez: testing trafficserver 8.0.6-rc0 in cp40[26,32]
* 10:06 vgutierrez: depool cp20[23,26] and reimage as buster - [[phab:T242093|T242093]]
* 10:01 vgutierrez: depool cp30[51-52] and reimage as buster - [[phab:T242093|T242093]]
* 09:38 ema: cp: rolling ats-tls-restart to enable analytics logging [[phab:T237993|T237993]]
* 09:26 ema: cp4027: ats-tls-restart to enable analytics logging to pipe [[phab:T237993|T237993]]
* 09:25 moritzm: rolling restart of cassandra on restbase-dev to pick up Java security updates
* 09:17 marostegui: Failover m2 master dbproxy from dbproxy1007 to dbproxy1013 - [[phab:T202367|T202367]]
* 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 09:11 marostegui: Upgrade and reboot dbproxy1013 before making it master - [[phab:T202367|T202367]]
* 08:55 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 08:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@9bbbb58]: (no justification provided) (duration: 00m 05s)
* 08:46 phedenskog@deploy1001: Started deploy [performance/navtiming@9bbbb58]: (no justification provided)
* 08:38 marostegui: Restart wikibugs as it doesn't show phab comments on irc - [[phab:T241109|T241109]]
* 08:21 moritzm: installing mesa security updates
* 07:28 vgutierrez: pool cp30[53-54] running buster - [[phab:T242093|T242093]]
* 07:18 oblivian@puppetmaster1001: conftool action : set/weight=30; selector: dc=eqiad,pool=appserver,name=mw132[3-4].*
* 07:16 oblivian@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,pool=appserver,service=nginx,name=mw12[3-5].*
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 20 forΒ  10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10391 and previous config saved to /var/cache/conftool/dbconfig/20200212-070250-marostegui.json
* 06:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:46 marostegui: Redact ngwikimedia on db1124:3313 and db2094:3313 [[phab:T240772|T240772]]
* 06:22 vgutierrez: depool cp30[53-54] and reimage as buster - [[phab:T242093|T242093]]
* 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 01:48 XioNoX: disabling peering session on cr1-eqsin (they're flapping otherwise)
* 00:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/page/ImageHistoryPseudoPager.php: [[phab:T244937|T244937]] ImageHistoryPseudoPager: Update doQuery() for IndexPager changes (duration: 01m 03s)
* 00:38 XioNoX: reboot cr1-eqsin
* 00:33 XioNoX: commit full on cr1-eqsin - [[phab:T243080|T243080]]
* 00:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: rm wgKartographerIconServer (duration: 01m 02s)
* 00:20 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: rm wgKartographerIconServer (duration: 01m 03s)
* 00:16 eileen: civicrm revision changed from {{Gerrit|ee9edf8137}} to {{Gerrit|55b2afb6eb}}, config revision is {{Gerrit|561ae21f77}}


== 2020-02-11 ==
== 2021-07-15 ==
* 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - [[phab:T243080|T243080]]
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - [[phab:T243080|T243080]]
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 21:45 cdanis: repool eqiad
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 21:36 bblack: re-pooling all cp10xx in eqiad
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 21:32 XioNoX: switchover RE mastership on cr1-eqsin - [[phab:T243080|T243080]]
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:14 robh: cp1067 powered back into service post firmware update via [[phab:T243167|T243167]]
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 21:11 cdanis: depool eqiad
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 ([[phab:T233867|T233867]])
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via [[phab:T243167|T243167]]
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:51 XioNoX: reboot backup RE on cr1-eqsin - [[phab:T243080|T243080]]
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 20:38 robh: depooling cp108[45] for firmware update via [[phab:T243167|T243167]]
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 02s)
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} β€œVisual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} β€œVisual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 03s)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:08 XioNoX: depool eqsin for router upgrade - [[phab:T243080|T243080]]
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:Β  (duration: 00m 04s)
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 ([[phab:T233867|T233867]])
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat ([[phab:T240771|T240771]])
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 18:43 twentyafterfour: getting ready to deploy wmf.18 refsΒ  [[phab:T233866|T233866]]
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 18:42 greg-g: restarting stashbot
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - [[phab:T242017|T242017]]
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 18:33 Urbanecm: Create ngwikimedia is done ([[phab:T240771|T240771]])
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 03s)
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia ([[phab:T240771|T240771]])
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia ([[phab:T240771|T240771]]) (duration: 01m 06s)
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 17:57 bblack: reboot dns2002 post-reimaging
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 17:13 vgutierrez: Disable KA on cp4031 - [[phab:T244464|T244464]]
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:49 vgutierrez: pool cp3055 running buster - [[phab:T242093|T242093]]
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:43 vgutierrez: repooling cp4031
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:38 vgutierrez: depooling cp4031 for some KA tests
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 16:25 vgutierrez: pool cp3056 running buster - [[phab:T242093|T242093]]
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - [[phab:T242017|T242017]]
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 16:21 bblack: dns2002 - stopping bird adverts to depool service for [[phab:T242017|T242017]]
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:20 bblack: dns2002 - downtimed in icinga for [[phab:T242017|T242017]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:38 vgutierrez: depool cp3056 and reimage as buster - [[phab:T242093|T242093]]
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:36 vgutierrez: pool cp3058 running buster - [[phab:T242093|T242093]]
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - [[phab:T242122|T242122]] (duration: 01m 08s)
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 14:58 vgutierrez: depool cp3055 and reimage as buster - [[phab:T242093|T242093]]
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 14:56 vgutierrez: pool cp3057 running buster - [[phab:T242093|T242093]]
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:20 vgutierrez: restart varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - [[phab:T242093|T242093]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:52 vgutierrez: pool cp3059 and cp3060 running buster - [[phab:T242093|T242093]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 12:34 Amir1: EU SWAT is done
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]], take II, cache (duration: 01m 06s)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:571339{{!}}Fix typo in the config name (T244697)]] (duration: 01m 05s)
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]], Second round, cache issue (duration: 01m 07s)
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:571338{{!}}Stop reading for the new term store as the default of client wikis (T244697)]] (duration: 01m 11s)
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - [[phab:T242093|T242093]]
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 11:59 vgutierrez: repool cp3061 and cp3062 running buster - [[phab:T242093|T242093]]
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - [[phab:T244464|T244464]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:56 vgutierrez: depool cp3062 and reimage as buster - [[phab:T242093|T242093]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:54 vgutierrez: repool cp3064 running buster - [[phab:T242093|T242093]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 10:51 vgutierrez: depool cp3061 and reimage as buster - [[phab:T242093|T242093]]
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 10:50 vgutierrez: repool cp5006 and cp3063 running buster - [[phab:T242093|T242093]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
* 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
* 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - [[phab:T244464|T244464]]
* 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade [[phab:T273281|T273281]]
* 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - [[phab:T242093|T242093]]
* 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
* 09:52 vgutierrez: depool cp5006 and reimage as buster - [[phab:T242093|T242093]]
* 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:52 vgutierrez: pool cp5007 running buster - [[phab:T242093|T242093]]
* 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
* 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
* 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
* 08:25 marostegui: Upgrade db1095:3312, db1095:3313
* 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
* 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
* 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
* 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704527{{!}}Make idwiki use protect mode of flaggedrevs (T268317)]] (duration: 01m 07s)
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
* 11:40 moritzm: restarting Etherpad to pick up libuv security update
* 07:57 akosiaris: [[phab:T242705|T242705]] systemctl stop uwsgi-ores on ores2001.
* 11:37 moritzm: restarting Turnilo to pick up libuv security update
* 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:34 moritzm: installing libuv1 security updates
* 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
* 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 07:47 marostegui: Upgrade es1013 - [[phab:T239791|T239791]]
* 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - [[phab:T239791|T239791]]', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
* 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
* 07:23 vgutierrez: depool cp5007 and reimage as buster - [[phab:T242093|T242093]]
* 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - [[phab:T285835|T285835]]
* 07:22 vgutierrez: pool cp5001 and cp5008 running buster - [[phab:T242093|T242093]]
* 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - [[phab:T272128|T272128]]
* 07:21 marostegui: Remove partitions from db2086:3318 - [[phab:T239453|T239453]]
* 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
* 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 [[phab:T239453|T239453]]', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
* 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - [[phab:T242702|T242702]]', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
* 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 06:59 marostegui: Stop haproxy on dbproxy1001 - [[phab:T244463|T244463]]
* 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 06:48 marostegui: Remove grants in m1 for dbproxy1001 - [[phab:T231280|T231280]]
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - [[phab:T242093|T242093]]
* 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - [[phab:T202367|T202367]]
* 10:02 effie: disableing puppet on maps* for 704394
* 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
* 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
* 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)
* 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
* 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T278619|T278619]]
* 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
* 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
* 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
* 07:48 moritzm: updated bullseye d-i image for latest daily build [[phab:T275873|T275873]]
* 07:31 godog: reimage thanos-fe2001 with bullseye - [[phab:T285835|T285835]]
* 07:23 elukey: restart planet-update-en.service on planet1002
* 07:17 elukey: remove /etc/rawdog/en/<nowiki>{</nowiki>state,state.lock<nowiki>}</nowiki> on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
* 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 06s)
* 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows ([[phab:T286521|T286521]]) (duration: 01m 07s)
* 05:50 kart_: Updated cxserver to 2021-07-14-124232-production ([[phab:T282369|T282369]], [[phab:T284450|T284450]])
* 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 00:00 twentyafterfour: phabricator update deployed.


== 2020-02-10 ==
== 2021-07-14 ==
* 23:30 robh: cp108[23] returned to service via [[phab:T243167|T243167]]
* 23:23 eileen: civicrm revision changed from {{Gerrit|b1c63470bb}} to {{Gerrit|e0d53c92b5}}, config revision is {{Gerrit|bb405c5232}}
* 23:28 legoktm: restarting zuul
* 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
* 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 04s)
* 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
* 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: [[phab:T244308|T244308]] (duration: 01m 07s)
* 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: [[gerrit:704609{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via [[phab:T243167|T243167]]
* 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: [[gerrit:704608{{!}}Move saving user options to onTransactionPreCommitOrIdle (T286521)]] (duration: 01m 05s)
* 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silentΒ  --id 2164
* 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
* 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
* 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: [[gerrit:704606{{!}}Fix deprecated offset() on invalid DOM (T185629)]] (duration: 01m 07s)
* 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
* 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via [[phab:T243167|T243167]]
* 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
* 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via [[phab:T243167|T243167]]
* 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:43 arlolra: Updated Parsoid to {{Gerrit|612106d2}} ([[phab:T244412|T244412]], [[phab:T244413|T244413]], [[phab:T242746|T242746]], [[phab:T235273|T235273]], [[phab:T235307|T235307]], [[phab:T238845|T238845]], [[phab:T204618|T204618]], [[phab:T240054|T240054]])
* 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki [[phab:T284456|T284456]]
* 21:38 robh: cp1075 & cp1076 offline for bios updates per [[phab:T243167|T243167]]
* 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 21:36 robh: cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging.Β  Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
* 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}} (duration: 10m 26s)
* 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
* 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to {{Gerrit|612106d2}}
* 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource [[phab:T284390|T284390]]
* 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]] (duration: 12m 18s)
* 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: [[phab:T242705|T242705]]
* 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 11s)
* 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results ([[phab:T244752|T244752]]) (duration: 01m 15s)
* 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:570393]] Config: Session Store: Switch group0 and group1 to kask-session [[phab:T243106|T243106]] (duration: 01m 06s)
* 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
* 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group ([[phab:T244508|T244508]])
* 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
* 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T242122|T242122]] Load new EventStreamConfig extension if so configured (duration: 01m 06s)
* 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
* 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T242122|T242122]] Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
* 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704383{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 06s)
* 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18Β  refs [[phab:T233866|T233866]] (duration: 01m 05s)
* 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: [[gerrit:704382{{!}}Do not lock preferences row for a rememberpassword check (T286521)]] (duration: 01m 05s)
* 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18Β  refs [[phab:T233866|T233866]]
* 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
* 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs [[phab:T233867|T233867]]
* 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
* 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs [[phab:T233866|T233866]]
* 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: [[gerrit:704404{{!}}TranslationAid: Handle empty message definition (T285830)]] and [[gerrit:704405{{!}}TranslationAid: Make sure to return successfully fetched definitions (T285830)]] (duration: 01m 09s)
* 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T244561|T244561]] Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
* 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 16:48 moritzm: installing libexif security updates on jessie
* 15:37 moritzm: installing klibc security updates
* 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - [[phab:T242093|T242093]]
* 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
* 15:45 XioNoX: push outbound flowspec support to core routers
* 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P<nowiki>{</nowiki>elastic*<nowiki>}<<