You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond execution again (duration: 01m 10s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(324 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-07-31 ==
== 2021-08-03 ==
* 00:47 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond execution again (duration: 01m 10s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: [[phab:T258954|T258954]] (duration: 01m 06s)
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:06 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Echo/modules/mobile/notificationsFilterOverlay.js: [[phab:T258954|T258954]] (duration: 01m 10s)
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 00:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2015.codfw.wmnet
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-07-30 ==
== 2021-08-02 ==
* 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 22:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2014.codfw.wmnet
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.1
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:52 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 21:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:51 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:16 tzatziki: removing 7 files for legal compliance
* 21:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2013.codfw.wmnet
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:41 mutante: revoking and resigning puppet cert for xhgui2001.codfw.wmnet [[phab:T259206|T259206]]
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:40 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/GrowthExperiments/: [[phab:T258609|T258609]] (duration: 01m 06s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 21:39 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/skins/MinervaNeue/: [[phab:T258939|T258939]] (duration: 01m 08s)
* 19:00 urbanecm: Morning B&C window completed
* 21:30 mutante: reinstalling xhgui2001
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 21:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 21:10 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@e797cf0]: 0.3.42 (duration: 24m 41s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2012.codfw.wmnet
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:45 ryankemper@deploy1001: Started deploy [wdqs/wdqs@e797cf0]: 0.3.42
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 20:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 20:08 mutante: [wtp2012:~] $ sudo rm -rf /srv/deployment/parsoid/deploy-cache
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 19:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2011.codfw.wmnet
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 19:13 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.36.0-wmf.1
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.2
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 19:01 mforns@deploy1001: Finished deploy [analytics/refinery@adb0d09]: Regular analytics weekly train [analytics/refinery@adb0d09b6584a7a26143623cf6173ae8983423e3] (duration: 10m 41s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:59 mutante: imported twig (php-twig) into APT repo
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:50 mforns@deploy1001: Started deploy [analytics/refinery@adb0d09]: Regular analytics weekly train [analytics/refinery@adb0d09b6584a7a26143623cf6173ae8983423e3]
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:34 Urbanecm: Morning B&C done
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:32 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Kartographer/modules/box/Map.js: {{Gerrit|aa3dbd54f8e422a511e55b5efba6b5f48253dbe7}}: Disable panning and zooming until ready ([[phab:T257872|T257872]]) (duration: 01m 06s)
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617516: Add import sources for yuewiktionary {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617516; 617518: Fix definition of yuewiktionary import sources {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617518 # [[phab:T258913|T258913]] (duration: 01m 06s)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|14ef2ec2956fd8f66be7efb3c5978ac0eda7ce97}}: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg ([[phab:T259243|T259243]]; 2/2) (duration: 01m 06s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 18:04 urbanecm@deploy1001: Synchronized static/favicon/wmf-blue.ico: {{Gerrit|14ef2ec2956fd8f66be7efb3c5978ac0eda7ce97}}: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg ([[phab:T259243|T259243]]; 1/2) (duration: 01m 06s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:04 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@d3ab874]: airflow: refinery_drop_hive_partitions: Fix kerberos token passing (duration: 00m 55s)
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 17:03 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@d3ab874]: airflow: refinery_drop_hive_partitions: Fix kerberos token passing
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 16:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 16:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 16:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 16:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 16:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 12:20 mutante: gerrit servers: disabling puppet
* 16:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 16:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 16:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 16:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 14:30 moritzm: installing squid security updates
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 14:04 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 13:50 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:27 hashar: restarting Jenkins on contint2001
* 13:47 vgutierrez: upgrade acme-chief to version 0.27 - [[phab:T255249|T255249]]
* 11:27 hashar: restarting Jenkins on contint1001
* 13:47 vgutierrez: upload acme-chief 0.27 to apt.wm.o (buster) - [[phab:T255249|T255249]]
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:46 moritzm: installing qemu security updates on Buster
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:02 jayme: imported chartmuseum_0.12.0-3 to buster-wikimedia
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 12:07 elukey: upgrade of the druid public cluster (serving AQS) from 0.12.3 to 0.19
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:53 urbanecm@deploy1001: Synchronized static/favicon/: {{Gerrit|c08f774b9b05cb9c5faf692c59dd45bf5d65b557}}: Revert "sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg" ([[phab:T259243|T259243]]) (duration: 01m 06s)
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:48 urbanecm@deploy1001: Synchronized static/favicon/wmf-blue.ico: {{Gerrit|399e9c5d949ade5f574ef965e05288b7253c3c3e}}: sysop_itwiki: Set favicon to Wikimedia_logo_blue.svg ([[phab:T259243|T259243]]; 1/2) (duration: 01m 06s)
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:46 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fc48441f26afc6f4e97c5f7e96185d04cacb0f4b}}: Add import sources to sysop_itwiki ([[phab:T259243|T259243]]) (duration: 01m 08s)
* 11:13 urbanecm: EU B&C window completed
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|fc5de151ee2c9cf5c64a7d13b2e65e39bb349296}}: ClosedWikiProvider: Do not run when $wmgUseCentralAuth is false ([[phab:T259246|T259246]]) (duration: 01m 07s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7aa0c2361a1e0e363700d54a109b81497b5a045b}}: sysop_itwiki: Add several pages to wgWhitelistRead ([[phab:T259243|T259243]]) (duration: 01m 06s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ea4bc87ee3b5ed1ef4f7cf2b1068e678f6eb42c}}: sysop_itwiki: Add WP as an alias for NS_PROJECT ([[phab:T259243|T259243]]) (duration: 01m 08s)
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 10:49 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.2 (duration: 01m 07s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 10:48 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.2
* 07:24 moritzm: installing libsndfile security updates on buster
* 10:38 marostegui: Reload haproxy on dbproxy1013 and dbproxy1015
* 07:12 moritzm: installing aspell security updates
* 08:43 godog: flip smokeping/librenms from netmon2001 to netmon1002 - [[phab:T247967|T247967]]
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 07:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:31 filippo@cumin1001: START - Cookbook sre.hosts.decommission
* 07:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P12129 and previous config saved to /var/cache/conftool/dbconfig/20200730-071633-marostegui.json
* 06:57 elukey: upload druid_0.19.0-1 packages to buster-wikimedia
* 05:26 marostegui: Deploy MCR schema change on labswiki (wikitech) [[phab:T238966|T238966]]
* 02:18 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2010.codfw.wmnet
* 01:53 dpifke@deploy1001: Finished deploy [performance/arc-lamp@ad87f69]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615302 (duration: 00m 05s)
* 01:53 dpifke@deploy1001: Started deploy [performance/arc-lamp@ad87f69]: Deploying https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615302
* 01:37 eileen: civicrm revision changed from {{Gerrit|cc5d17fbaf}} to {{Gerrit|150c3476c4}}, config revision is {{Gerrit|b6ece03513}}
* 01:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 01:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2009.codfw.wmnet
* 01:22 mutante: imported in apt.wikimedia.org for buster: php-slim, php-slim-views, php-perftools-xhgui-collector, php-pimple, php-psr-http-server-middleware, php-psr-http-server-handler, xhgui
* 01:07 mholloway-shell@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/JsonConfig: Backport: Implement GetContentModels hook ([[phab:T259126|T259126]]) (duration: 01m 07s)
* 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-07-29 ==
== 2021-07-31 ==
* 23:51 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2008.codfw.wmnet
* 23:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 23:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 23:09 Urbanecm: Run mwscript namespaceDupes.php --wiki=mswiktionary --fix ([[phab:T255391|T255391]])
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|396a395c79c606cb7deeb7906fefc7f16e63fa4f}}: Add several extra namespaces for mswiktionary ([[phab:T255391|T255391]]) (duration: 01m 07s)
* 22:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2006.codfw.wmnet
* 22:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2007.codfw.wmnet
* 22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 21:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2 (duration: 00m 05s)
* 20:35 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next pt2
* 20:35 crusnov@deploy1001: Finished deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next (duration: 01m 12s)
* 20:34 crusnov@deploy1001: Started deploy [netbox/deploy@fde9dfe]: Test deploy of 2.8.8 to netbox-next
* 20:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2004.codfw.wmnet
* 19:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:44 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:41 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:29 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:27 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:20 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 19:19 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:18 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 19:04 qchris: Restarting Gerrit on gerrit2001 (gerrit-replica) to make security fix effective.
* 19:04 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001 (duration: 00m 09s)
* 19:03 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit2001
* 19:00 qchris: Restarting Gerrit on gerrit1001 to make security fix effective.
* 19:00 qchris@deploy1001: Finished deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001 (duration: 00m 08s)
* 19:00 qchris@deploy1001: Started deploy [gerrit/gerrit@9275b30]: Gerrit to v3.2.3-1-g185bdc3a69 on gerrit1001
* 18:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:39 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:32 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:32 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:13 Urbanecm: Morning B&C window is done
* 18:13 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/DiscussionTools/: {{Gerrit|00ecec80d12a34977d55dd09bce0c5a1aab369f9}}: Revert new reply API for now ([[phab:T252558|T252558]]) (duration: 01m 06s)
* 18:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d54f041be6508b641eec08e25287d280374cc863}}: Enable Translate extension at plwikimedia ([[phab:T259087|T259087]]) (duration: 01m 08s)
* 18:07 urbanecm@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: {{Gerrit|a237f5b40c3662c0f08398abeeaadba61d7462f8}}: Move VisualEditor from beta to default on enwikiversity ([[phab:T258992|T258992]]) (duration: 01m 06s)
* 18:05 Urbanecm: Create tables for Translate extension in plwikimedia ([[phab:T259087|T259087]])
* 18:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2003.codfw.wmnet
* 17:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2002.codfw.wmnet
* 17:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 17:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:45 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:16 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:15 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 16:15 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:02 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617167: Revert "Set muswiki to read only" {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617167 ([[phab:T259004|T259004]]) (duration: 01m 06s)
* 15:44 volans@cumin1001: START - Cookbook sre.dns.netbox
* 15:33 liw@deploy1001: rebuilt and synchronized wikiversions files: Revert "group[0{{!}}1] wikis to 1.36.0-wmf.1"
* 15:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 617152: Set muswiki to read only {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/617152 ([[phab:T259004|T259004]]) (duration: 01m 08s)
* 15:10 jayme: imported docker-report_0.0.8-1 to buster-wikimedia
* 14:49 moritzm: installing ruby-json security updates
* 14:34 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 jbond42: install curl security update for jessie
* 14:29 moritzm: installing exiv2 security updates
* 14:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:55 volans: migrating *all* codfw mgmt DNS records to the autogenerated ones via Netbox - [[phab:T233183|T233183]]
* 13:50 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 13:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp2001.codfw.wmnet
* 13:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:05 liw@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.2 (duration: 01m 07s)
* 13:04 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.2
* 13:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:56 volans@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 12:48 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:44 moritzm: imported curl 7.38.0-4+deb8u16+wmf1 to apt.wikimedia.org (jessie-wikimedia) [[phab:T259102|T259102]]
* 12:30 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 21s)
* 12:28 urbanecm@deploy1001: Synchronized langlist: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 05s)
* 12:27 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 03s)
* 12:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:24 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating avkwiki ([[phab:T257943|T257943]])
* 12:15 urbanecm@deploy1001: Synchronized dblists: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:14 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 06s)
* 12:12 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating avkwiki ([[phab:T257943|T257943]]) (duration: 01m 05s)
* 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:07 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:07 moritzm: rebooting idp2001 for kernel update
* 11:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|252bb6c1bf83d96a14a0ef63e06eb544eef8a00b}}: Add Wikipedia wordmark for trwiki ([[phab:T255489|T255489]]; sync 2/2) (duration: 01m 05s)
* 11:39 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/wikipedia-wordmark-tr.svg: {{Gerrit|252bb6c1bf83d96a14a0ef63e06eb544eef8a00b}}: Add Wikipedia wordmark for trwiki ([[phab:T255489|T255489]]; sync 1/2) (duration: 01m 06s)
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f7e03292941d0d782437862f406efa7e1c6463e}}: Fix overindentation (duration: 01m 08s)
* 11:11 Lucas_WMDE: EU B&C window done
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/%s\n' 'wuuwiki.png' 'wuuwiki-1.5x.png' 'wuuwiki-2x.png' {{!}} mwscript purgeList.php # [[phab:T259005|T259005]]
* 11:08 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: [[gerrit:616760{{!}}Change the logo for Wu Wikipedia (T259005)]] (duration: 01m 08s)
* 10:40 vgutierrez: rolling upgrade of ATS to version 8.0.8-1wm2
* 10:21 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: do not offer .ly downloads (duration: 01m 07s)
* 10:19 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/extension.json: do not offer .ly downloads (duration: 01m 20s)
* 10:12 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp3064 and cp3065
* 09:44 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp5006 and cp5012
* 09:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 09:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:16 vgutierrez: upgrade ATS to version 8.0.8-1wm2 on cp4026 and cp4032
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P12115 and previous config saved to /var/cache/conftool/dbconfig/20200729-091528-marostegui.json
* 09:15 vgutierrez: upload trafficserver 8.0.8-1wm2 to apt.wm.o (buster)
* 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12114 and previous config saved to /var/cache/conftool/dbconfig/20200729-091319-marostegui.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P12113 and previous config saved to /var/cache/conftool/dbconfig/20200729-091006-marostegui.json
* 08:55 marostegui: The above was db1112
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1121', diff saved to https://phabricator.wikimedia.org/P12112 and previous config saved to /var/cache/conftool/dbconfig/20200729-085504-marostegui.json
* 08:42 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2001.codfw.wmnet
* 08:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 08:05 marostegui: Deploy MCR schema change on db1121 (lag will show up on s4), also remove triggers on db1124:3314
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P12111 and previous config saved to /var/cache/conftool/dbconfig/20200729-080442-marostegui.json
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1141', diff saved to https://phabricator.wikimedia.org/P12110 and previous config saved to /var/cache/conftool/dbconfig/20200729-080318-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12109 and previous config saved to /var/cache/conftool/dbconfig/20200729-075558-marostegui.json
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12108 and previous config saved to /var/cache/conftool/dbconfig/20200729-074828-marostegui.json
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1141', diff saved to https://phabricator.wikimedia.org/P12107 and previous config saved to /var/cache/conftool/dbconfig/20200729-074414-marostegui.json
* 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:26 XioNoX: standardize mr1-eqiad interfaces
* 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P12106 and previous config saved to /var/cache/conftool/dbconfig/20200729-062224-marostegui.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P12105 and previous config saved to /var/cache/conftool/dbconfig/20200729-062009-marostegui.json
* 06:16 XioNoX: standardize mr1-codfw interfaces
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P12104 and previous config saved to /var/cache/conftool/dbconfig/20200729-061450-marostegui.json
* 06:05 XioNoX: standardize mr1-ulsfo interfaces
* 06:01 legoktm: ssh doc1001.eqiad.wmnet sudo -u doc-uploader git -C /srv/docroot pull
* 05:52 XioNoX: standardize mr1-eqsin interfaces
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P12103 and previous config saved to /var/cache/conftool/dbconfig/20200729-050346-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P12102 and previous config saved to /var/cache/conftool/dbconfig/20200729-050247-marostegui.json
* 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1142', diff saved to https://phabricator.wikimedia.org/P12101 and previous config saved to /var/cache/conftool/dbconfig/20200729-050204-marostegui.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1142', diff saved to https://phabricator.wikimedia.org/P12100 and previous config saved to /var/cache/conftool/dbconfig/20200729-045859-marostegui.json
* 02:19 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enable lilypond in safe mode (duration: 01m 09s)
* 01:47 tstarling@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 07s)
* 01:45 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: work around firejail bug (duration: 01m 08s)
* 01:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1048.eqiad.wmnet
* 01:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1047.eqiad.wmnet
* 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1046.eqiad.wmnet
* 00:48 ryankemper: sudo -E cumin -b 10 'A:wdqs-all' 'sudo run-puppet-agent'
* 00:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime


== 2020-07-28 ==
== 2021-07-30 ==
* 23:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: reduce mlr window size on enwiki (duration: 01m 05s)
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:34 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: cirrus: reduce mlr window size on enwiki (duration: 01m 06s)
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 23:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 23:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove unused setting $wgGEHomepageSuggestedEditsNewAccountInitiatedPercentage (no-op) (duration: 01m 06s)
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 22:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=wtp1046.eqiad.wmnet
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1044.eqiad.wmnet
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:27 dancy@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:24 dancy@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:17 dancy@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 20:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 20:02 eileen: process-control config revision is {{Gerrit|b6ece03513}}
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 19:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 19:25 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 19:24 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 19:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P12097 and previous config saved to /var/cache/conftool/dbconfig/20200728-191926-marostegui.json
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1147', diff saved to https://phabricator.wikimedia.org/P12096 and previous config saved to /var/cache/conftool/dbconfig/20200728-191237-marostegui.json
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 19:11 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries (duration: 00m 53s)
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 19:11 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@69bbbbb]: airflow: drop_old_data_daily: top_queries table renamed to fulltext_head_queries
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 19:09 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12095 and previous config saved to /var/cache/conftool/dbconfig/20200728-190933-marostegui.json
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 19:06 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12094 and previous config saved to /var/cache/conftool/dbconfig/20200728-190517-marostegui.json
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 19:03 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1147', diff saved to https://phabricator.wikimedia.org/P12093 and previous config saved to /var/cache/conftool/dbconfig/20200728-190137-marostegui.json
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 18:35 cdanis: ✔️ cdanis@lvs1015.eqiad.wmnet ~ 🕝☕ sudo ipvsadm -D -t 10.2.2.51:9283
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 18:29 cdanis: ❌cdanis@lvs1016.eqiad.wmnet ~ 🕝☕ sudo ipvsadm -D -t 10.2.2.51:9283
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 18:29 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/extensions/GrowthExperiments/extension.json: Fix reference to MentorChangeLogFormatter ([[phab:T259041|T259041]]) (duration: 01m 05s)
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 18:20 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (2 of 2) (duration: 00m 58s)
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 18:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: No-op sync for wmgUseWikimediaApiPortal and wmgUseWikimediaApiPortalOAuth (1 of 2) (duration: 01m 05s)
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 18:16 cdanis: primary pybal restart ✔️ cdanis@lvs1015.eqiad.wmnet ~ 🕑☕ sudo systemctl restart pybal.service
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 18:14 cdanis: backup pybal restart: ✔️ cdanis@lvs1016.eqiad.wmnet ~ 🕑☕ sudo systemctl restart pybal.service
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 18:10 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 18:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 18:05 catrope@deploy1001: Synchronized php-1.36.0-wmf.2/includes/libs/filebackend/SwiftFileBackend.php: Fix index error in SwiftFileBackend ([[phab:T259023|T259023]]) (duration: 01m 07s)
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 17:46 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 05s)
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 17:46 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 17:41 volans: run apt-get clean on  wtp[1046,1048].eqiad.wmnet and wtp2001.codfw.wmnet to free ~`2GB as they were 100% - [[phab:T258775|T258775]]
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 17:33 XioNoX: standardize mr1-esams interfaces
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 17:30 brennen@deploy1001: sync aborted: (no justification provided) (duration: 28m 53s)
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 17:03 brennen: prior scap sync for https://gerrit.wikimedia.org/r/c/mediawiki/core/+/616842 ([[phab:T259023|T259023]])
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 17:02 brennen@deploy1001: Started scap: (no justification provided)
* 11:23 moritzm: installing libsndfile security updates on stretch
* 16:51 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign (duration: 04m 33s)
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 16:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 16:47 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 16:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@0982d4e]: convert_to_esbulk: repair variable ref before assign
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 16:45 XioNoX: remove mr1-codfw source NAT (not used)
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 16:43 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1045.eqiad.wmnet
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 16:39 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 16:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 16:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 16:33 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 16:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 16:31 XioNoX: mr1-eqiad# delete security nat source rule-set mgmt-to-untrust  (unused, no matching ACL)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 16:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 16:21 hnowlan: imported envoyproxy 1.15.0-1 deb into component/envoy-future for buster-wikimedia
* 16:11 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1042.eqiad.wmnet
* 16:09 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1043.eqiad.wmnet
* 15:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:51 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 15:50 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:48 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 15:45 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1035.*
* 15:44 jayme@cumin1001: conftool action : set/pooled=no; selector: name=wtp1034.*
* 15:35 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: once more (duration: 03m 06s)
* 15:32 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: once more
* 15:32 ayounsi@deploy1001: Finished deploy [homer/deploy@5e999c8]: CR613642 (duration: 03m 38s)
* 15:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1045.eqiad.wmnet
* 15:30 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1041.eqiad.wmnet
* 15:30 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1044.eqiad.wmnet
* 15:29 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1039.eqiad.wmnet
* 15:28 ayounsi@deploy1001: Started deploy [homer/deploy@5e999c8]: CR613642
* 15:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:17 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 15:16 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:13 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:08 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 02m 14s)
* 15:06 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
* 15:01 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR613642 (duration: 00m 11s)
* 15:01 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR613642
* 14:58 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1043.eqiad.wmnet
* 14:58 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1040.eqiad.wmnet
* 14:57 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1042.eqiad.wmnet
* 14:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1038.eqiad.wmnet
* 14:52 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:48 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:23 herron: bounced centrallog rsyslog services in codfw/eqiad
* 14:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P12087 and previous config saved to /var/cache/conftool/dbconfig/20200728-140313-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P12086 and previous config saved to /var/cache/conftool/dbconfig/20200728-140249-marostegui.json
* 14:02 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12085 and previous config saved to /var/cache/conftool/dbconfig/20200728-140220-marostegui.json
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1148', diff saved to https://phabricator.wikimedia.org/P12084 and previous config saved to /var/cache/conftool/dbconfig/20200728-140207-marostegui.json
* 14:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:58 moritzm: installing perl security updates
* 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:56 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:55 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1041.eqiad.wmnet
* 13:55 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1037.eqiad.wmnet
* 13:50 godog: remove stale ipvs thanos-query service on port 80
* 13:39 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1040.eqiad.wmnet
* 13:38 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1036.eqiad.wmnet
* 13:38 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1039.eqiad.wmnet
* 13:37 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1035.eqiad.wmnet
* 13:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1038.eqiad.wmnet
* 13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1034.eqiad.wmnet
* 13:29 godog: roll-restart pybal on eqiad lvs low-traffic to change port for thanos-query
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P12083 and previous config saved to /var/cache/conftool/dbconfig/20200728-132520-marostegui.json
* 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 with less weight', diff saved to https://phabricator.wikimedia.org/P12082 and previous config saved to /var/cache/conftool/dbconfig/20200728-132023-marostegui.json
* 13:09 godog: roll-restart pybal on lvs low-traffic to apply thanos-query changes
* 13:04 XioNoX: standardize cr3-esams interfaces
* 13:03 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.2
* 12:41 XioNoX: standardize cr2-esams interfaces
* 12:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:36 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P12081 and previous config saved to /var/cache/conftool/dbconfig/20200728-123201-marostegui.json
* 12:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:26 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:24 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 12:17 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1037.eqiad.wmnet
* 12:14 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1036.eqiad.wmnet
* 12:08 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1035.eqiad.wmnet
* 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1032.eqiad.wmnet
* 12:07 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1033.eqiad.wmnet
* 12:05 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1031.eqiad.wmnet
* 12:04 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1034.eqiad.wmnet
* 12:04 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disabling lilypond rendering in Score again due to error running gs (duration: 01m 05s)
* 11:56 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: re-enabling Score in safe mode (duration: 01m 04s)
* 11:50 Urbanecm: EU B&C window done
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1a5672628b82709350ca74bb784197e7ff5fdc19}}: Add Turkish powered by MW and Wikimedia project icons ([[phab:T257732|T257732]]) (duration: 00m 59s)
* 11:46 urbanecm@deploy1001: Synchronized static/images/footer/: {{Gerrit|1a5672628b82709350ca74bb784197e7ff5fdc19}}: Add Turkish powered by MW and Wikimedia project icons ([[phab:T257732|T257732]]) (duration: 01m 01s)
* 11:43 urbanecm@deploy1001: Synchronized static/images: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 01m 02s)
* 11:38 marostegui: Deploy schema change on s3 codfw, this will generate lag on codfw [[phab:T256682|T256682]]
* 11:38 ema: A:cp-text varnish ban pt.wikiversity.org [[phab:T256750|T256750]]
* 11:37 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 00m 58s)
* 11:36 ema: A:cp-text varnish ban fr.wiktionary.org [[phab:T256750|T256750]]
* 11:35 urbanecm@deploy1001: Synchronized static/images/footer: {{Gerrit|df9b9acf0876dad9b11d5641fe6fa174c7066f8b}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 01m 05s)
* 11:34 ema: A:cp-text varnish ban eu.wikipedia.org [[phab:T256750|T256750]]
* 11:32 ema: A:cp-text varnish ban he.wikipedia.org [[phab:T256750|T256750]]
* 11:30 marostegui: Deploy MCR change on db1143, db1148, db1146:3314
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12079 and previous config saved to /var/cache/conftool/dbconfig/20200728-113009-marostegui.json
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|04c7ef94bb7901668f2a8df3289b6a59d42f0a7e}}: Undeploy graphoid for phase 2 wikis ([[phab:T258463|T258463]]) (duration: 01m 00s)
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1143', diff saved to https://phabricator.wikimedia.org/P12078 and previous config saved to /var/cache/conftool/dbconfig/20200728-112850-marostegui.json
* 11:25 ema: A:cp-text varnish ban fa.wikipedia.org [[phab:T256750|T256750]]
* 11:21 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [cirrus] use more neutral config var names (duration: 01m 06s)
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12077 and previous config saved to /var/cache/conftool/dbconfig/20200728-112046-marostegui.json
* 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12076 and previous config saved to /var/cache/conftool/dbconfig/20200728-111522-marostegui.json
* 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1143', diff saved to https://phabricator.wikimedia.org/P12075 and previous config saved to /var/cache/conftool/dbconfig/20200728-111226-marostegui.json
* 11:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614890 desktop improvements by default for testing group (round 2) (T254227)]] (duration: 01m 06s)
* 11:09 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:56 hashar@deploy1001: Finished deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for [[phab:T149924|T149924]] (duration: 00m 06s)
* 10:56 hashar@deploy1001: Started deploy [integration/docroot@ba85bdf]: Catch up with HEAD and support DOCUMENT_ROOT being a symbolic link for [[phab:T149924|T149924]]
* 10:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:53 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 10:50 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1033.eqiad.wmnet
* 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1030.eqiad.wmnet
* 10:48 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1029.eqiad.wmnet
* 10:47 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1032.eqiad.wmnet
* 10:47 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1028.eqiad.wmnet
* 10:33 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1031.eqiad.wmnet
* 10:32 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1027.eqiad.wmnet
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12074 and previous config saved to /var/cache/conftool/dbconfig/20200728-102342-marostegui.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12072 and previous config saved to /var/cache/conftool/dbconfig/20200728-100412-marostegui.json
* 09:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 09:55 XioNoX: standardize cr2-esams interfaces
* 09:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:50 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:43 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 09:35 moritzm: imported libmysqlclient18 to component/cloudera [[phab:T258768|T258768]]
* 09:31 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1030.eqiad.wmnet
* 09:28 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1029.eqiad.wmnet
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12070 and previous config saved to /var/cache/conftool/dbconfig/20200728-092606-marostegui.json
* 09:24 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1028.eqiad.wmnet
* 09:19 XioNoX: standardize cr3-eqsin interfaces
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12069 and previous config saved to /var/cache/conftool/dbconfig/20200728-091849-marostegui.json
* 09:18 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp1027.eqiad.wmnet
* 09:10 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1026.eqiad.wmnet
* 09:07 ema: cp3050: restart varnishmtail.service, stuck on "Condition(c->offset <= c->vtx->len) not true."
* 08:39 XioNoX: standardize cr2-eqsin interfaces
* 08:38 godog: temporary downgrade prometheus-snmp-exporter on netmon2001
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12067 and previous config saved to /var/cache/conftool/dbconfig/20200728-083336-marostegui.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P12066 and previous config saved to /var/cache/conftool/dbconfig/20200728-083209-marostegui.json
* 08:20 liw@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.2 (duration: 53m 11s)
* 08:09 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 08:06 godog: failover librenms/smokeping to netmon2001 - [[phab:T247967|T247967]]
* 08:04 marostegui: Reduce labsdb1009 weight
* 07:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:48 jayme: depooled wtp1026.eqiad.wmnet for reimage
* 07:48 moritzm: switched superset to CAS
* 07:47 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:46 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:43 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=wtp1025.eqiad.wmnet
* 07:27 liw@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.2
* 07:03 liw: 1.36.0-wmf.2 was branched at {{Gerrit|04e863fdf3646ee6ed5c05b784f85c9f323e1f19}} for [[phab:T257970|T257970]]
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12065 and previous config saved to /var/cache/conftool/dbconfig/20200728-051928-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314 and restore db1146:3314 original weight', diff saved to https://phabricator.wikimedia.org/P12064 and previous config saved to /var/cache/conftool/dbconfig/20200728-051813-marostegui.json
* 02:17 eileen: process-control config revision is {{Gerrit|6811ca294a}} - just delayed silverpop_daily a bit as clashing with dedupe
* 00:18 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1003.eqiad.wmnet
* 00:17 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1003.eqiad.wmnet


== 2020-07-27 ==
== 2021-07-29 ==
* 23:49 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support (duration: 00m 54s)
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:48 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ac8e5d0]: airflow: head queries report, managed variables, refinery-drop-hive-partitions support
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:28 mutante: otrs1001 - ran puppet (it was alerting in icinga that puppet failed, but it was neither disabled nor failing and changed nothing when it ran)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 21:31 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deployed CentralNotice CSP conifg change for [[phab:T258459|T258459]] (duration: 00m 57s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 21:10 sbassett: Deployed mitigations for [[phab:T238075|T238075]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 20:41 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/InterwikiSorting/: {{Gerrit|c5f6c97856a5dbe673064afd2804bebb9b787580}}: Use LanguageLinksHook to sort interwiki links ([[phab:T257625|T257625]]) (duration: 00m 59s)
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 19:50 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 19:44 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 19:36 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 19:23 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 19:19 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'sessionstore' for release 'production' .
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 19:11 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'sessionstore' for release 'staging' .
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:06 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 19:00 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'echostore' for release 'production' .
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 18:57 urbanecm@deploy1001: sync-file aborted: {{Gerrit|3833b135caf4171daa0814eba81393b6c44db619}}: Move footer logos to /static/images/footer ([[phab:T257732|T257732]]) (duration: 00m 04s)
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|c6a9674366d9c8d273ce0e74dfb6a04c91d64307}}: Move footer logos to wmg* variables ([[phab:T257732|T257732]]) (duration: 00m 56s)
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 18:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 00m 57s)
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 18:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 18:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c6a9674366d9c8d273ce0e74dfb6a04c91d64307}}: Move footer logos to wmg* variables ([[phab:T257732|T257732]]) (duration: 00m 57s)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 18:29 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable desktop web UI click tracking instrumentation on frwiki, hewiki, fawiki ([[phab:T258058|T258058]]) (duration: 00m 56s)
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 18:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove WPBSkinBlacklist ([[phab:T254675|T254675]]) (duration: 00m 57s)
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 17:42 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.1
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 17:30 liw: promoting train to group2
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 17:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 17:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 17:14 dpifke@deploy1001: Finished deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs ([[phab:T235456|T235456]]) (duration: 00m 05s)
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 17:14 dpifke@deploy1001: Started deploy [performance/arc-lamp@f14888b]: Deploying arclamp-compress-logs ([[phab:T235456|T235456]])
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:59 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
* 14:11 vgutierrez: restart pybal on lvs2009
* 16:58 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephmon1002.eqiad.wmnet
* 14:09 vgutierrez: restart pybal on lvs2010
* 16:57 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephmon1002.eqiad.wmnet
* 14:07 vgutierrez: restart pybal on lvs2008
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1003.eqiad.wmnet
* 14:05 vgutierrez: restart pybal on lvs2007
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1002.eqiad.wmnet
* 13:59 vgutierrez: restart pybal on lvs1014
* 16:50 andrew@cumin1001: conftool action : set/pooled=inactive; selector: name=cloudcephosd1001.eqiad.wmnet
* 13:55 vgutierrez: restart pybal on lvs1015
* 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1003.eqiad.wmnet
* 13:52 _joe_: restarting pybal on lvs1016
* 16:50 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1002.eqiad.wmnet
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 16:49 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephmon1001.eqiad.wmnet
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1003.wikimedia.org
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1002.wikimedia.org
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 16:48 andrew@cumin1001: conftool action : set/pooled=no; selector: name=cloudcephosd1001.wikimedia.org
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1001.eqiad.wmnet
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 16:48 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1002.eqiad.wmnet
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 16:47 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cloudcephosd1003.eqiad.wmnet
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 16:44 andrew@cumin1001: conftool action : set/pooled=yes; selector: name=cumin1001.eqiad.wmnet
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316, db2087:3317 after on-site maintenance [[phab:T258587|T258587]]', diff saved to https://phabricator.wikimedia.org/P12063 and previous config saved to /var/cache/conftool/dbconfig/20200727-163311-marostegui.json
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 16:05 marostegui: Will show up on labsdb hosts for s5
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 16:04 marostegui: Stop MySQL on db1082 for onsite maintenance - [[phab:T258910|T258910]]
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 15:03 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db1146:3314 weight while db1144:3314 is depooled', diff saved to https://phabricator.wikimedia.org/P12060 and previous config saved to /var/cache/conftool/dbconfig/20200727-145010-marostegui.json
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 14:48 marostegui: Deploy MCR change on db1144:3314
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12059 and previous config saved to /var/cache/conftool/dbconfig/20200727-144807-marostegui.json
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1149', diff saved to https://phabricator.wikimedia.org/P12058 and previous config saved to /var/cache/conftool/dbconfig/20200727-144034-marostegui.json
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:52 moritzm: restarting Tomcat on idp-test
* 14:19 XioNoX: standardize cr1-codfw interfaces
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 14:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 14:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 14:04 jayme@cumin1001: START - Cookbook sre.hosts.downtime
* 13:57 moritzm: upgrading idp2001 to CAS 6.1.7.1
* 13:19 XioNoX: standardize some cr2-esams interfaces
* 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'More weight to db1089 in main traffic', diff saved to https://phabricator.wikimedia.org/P12057 and previous config saved to /var/cache/conftool/dbconfig/20200727-131123-marostegui.json
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with normal weight and pool db1089 into vslow', diff saved to https://phabricator.wikimedia.org/P12056 and previous config saved to /var/cache/conftool/dbconfig/20200727-130954-marostegui.json
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12055 and previous config saved to /var/cache/conftool/dbconfig/20200727-130713-marostegui.json
* 13:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12054 and previous config saved to /var/cache/conftool/dbconfig/20200727-125824-marostegui.json
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12053 and previous config saved to /var/cache/conftool/dbconfig/20200727-125351-marostegui.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3311 with less weight', diff saved to https://phabricator.wikimedia.org/P12052 and previous config saved to /var/cache/conftool/dbconfig/20200727-125207-marostegui.json
* 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12051 and previous config saved to /var/cache/conftool/dbconfig/20200727-125045-marostegui.json
* 12:41 marostegui: Compress innodb on db1106, this will generate lag on enwiki on labsdb hosts (wiki replicas) [[phab:T254462|T254462]]
* 12:38 moritzm: disable puppet on idp1001/2001
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 and pool db1105:3311 as vslow [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P12050 and previous config saved to /var/cache/conftool/dbconfig/20200727-123833-marostegui.json
* 12:37 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 12:37 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 12:37 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 12:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 12:36 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=eqiad,service=mobileapps,name=scb1001.eqiad.wmnet
* 12:31 XioNoX: standardize cr2-codfw interfaces
* 12:28 volans@deploy1001: Finished deploy [debmonitor/deploy@25dbd20]: Release v0.2.7 (duration: 00m 27s)
* 12:28 volans@deploy1001: Started deploy [debmonitor/deploy@25dbd20]: Release v0.2.7
* 12:25 jbond42: upload new cas package to buster-wikimedia
* 12:25 jbond42: upload new cas package
* 12:23 ema: A:cp rolling varnish-frontend restart to actually discard old VCL still pointing at varnishcheck/check [[phab:T255015|T255015]] [[phab:T236754|T236754]]
* 12:21 moritzm: installing ruby-json security updates
* 12:16 moritzm: installing batik security updates
* 11:59 marostegui: Deploy MCR schema change on db1149
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12049 and previous config saved to /var/cache/conftool/dbconfig/20200727-115818-marostegui.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1138', diff saved to https://phabricator.wikimedia.org/P12048 and previous config saved to /var/cache/conftool/dbconfig/20200727-115739-marostegui.json
* 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1138', diff saved to https://phabricator.wikimedia.org/P12047 and previous config saved to /var/cache/conftool/dbconfig/20200727-115258-marostegui.json
* 11:28 moritzm: installing an-tool1009 [[phab:T258768|T258768]]
* 10:54 ema: upload atskafka 0.10 to buster-wikimedia, upgrade cp3050 [[phab:T254317|T254317]]
* 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:616463{{!}} Bumping portals to master (616463)]] (duration: 01m 05s)
* 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:616463{{!}} Bumping portals to master (616463)]] (duration: 01m 10s)
* 10:33 XioNoX: make cr*-ulsfo interfaces netbox compliant
* 08:39 XioNoX: push "Add 185.71.138.0/24 to wikimedia4" to all routers
* 07:00 marostegui: Deploy schema change on s5 codfw [[phab:T256682|T256682]]
* 06:44 elukey: truncate big log file on an-launcher1002 that is filling up the /srv partition
* 06:36 elukey: apt-get clean on netbox1001 to free some space
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138 for MCR schema change', diff saved to https://phabricator.wikimedia.org/P12043 and previous config saved to /var/cache/conftool/dbconfig/20200727-051156-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316, db2087:3317 for on-site maintenance [[phab:T258587|T258587]]', diff saved to https://phabricator.wikimedia.org/P12042 and previous config saved to /var/cache/conftool/dbconfig/20200727-050058-marostegui.json
* 04:58 marostegui: Stop MySQL on db2087 for on-site maintenance [[phab:T258587|T258587]]


== 2020-07-25 ==
== 2021-07-28 ==
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1096:3315 into s5 api afte db1082 crashed [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P12041 and previous config saved to /var/cache/conftool/dbconfig/20200725-124104-marostegui.json
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 09:16 oblivian@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P12040 and previous config saved to /var/cache/conftool/dbconfig/20200725-091616-oblivian.json
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 01:52 mutante: ganeti - also removing (unmounted) disk 2 (100G) from webperf1002. [[phab:T257931|T257931]]
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 00:46 mutante: ganeti - removing disk 3 (20G) from webperf1002. the disks are 0-indexed, so the ones actually mounted are 0 (50G) and 1 (300G) ([[phab:T257931|T257931]])
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 00:42 dpifke: Manually compressing some more data on webperf1002, using arclamp-compress-logs from https://gerrit.wikimedia.org/r/c/performance/arc-lamp/+/615904.
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:08 moritzm: installing python3.5 security updates on stretch
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:27 moritzm: installing nginx security updates on thumbor*
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 08:27 Amir1: running several long-running queries against pc1007
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:53 moritzm: installing aspell security updates on stretch
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php


== 2020-07-24 ==
== 2021-07-27 ==
* 23:00 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 20:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 19:57 dpifke: Manually gzipping some older ArcLamp data on webperf1002, to free up space and verify new compression support.
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 19:55 dpifke@deploy1001: Finished deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp (duration: 00m 05s)
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 19:55 dpifke@deploy1001: Started deploy [performance/arc-lamp@772b4a3]: Deploy CLs 611465 and 613740 to add compression support to ArcLamp
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 16:55 Amir1: deployment done
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 16:49 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/RepoHooks.php: [[gerrit:616032{{!}}Prevent onTitleGetRestrictionTypes changing ns0 protections]], Part II (duration: 01m 07s)
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 16:47 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase/repo/includes/WikibaseRepo.php: [[gerrit:616032{{!}}Prevent onTitleGetRestrictionTypes changing ns0 protections]], Part I (duration: 01m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 15:06 reedy@deploy1001: Finished scap: Score backports (duration: 36m 50s)
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 14:30 reedy@deploy1001: Started scap: Score backports
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 13:31 XioNoX: advertise 185.71.138.0/24 from AMS
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:00 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.1/includes/import/ImportableOldRevisionImporter.php: [[gerrit:616029{{!}}Import: use master DB for loading slots.]] ([[phab:T258666|T258666]]) (duration: 01m 07s)
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 12:34 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 12:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 11:48 hnowlan: bootstrapped restbase-dev1004-b
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 11:13 hnowlan: started bootstrap of restbase-dev1004-a
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 10:35 hnowlan: started reimage of restbase-dev1004
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 09:59 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 09:48 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 08:40 kormat: restarting mariadb on all sanitarium hosts [[phab:T258711|T258711]]
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 08:35 akosiaris: start nagios-nrpe-server on kubernetes2002
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 07:44 elukey: depool wtp1025 - disk full
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 06:30 tstarling@deploy1001: Started scap: for Score
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 02:36 tstarling@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Score/includes/Score.php: removing superseded local patch for hard-coding lilypond version (duration: 01m 09s)
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 01:19 ejegg: updated payments-wiki from {{Gerrit|31a3de1130}} to {{Gerrit|c365c136d2}}
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 01:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 01:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 01:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 00:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 00:46 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 00:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 00:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 00:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 00:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 00:42 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 00:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 00:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 00:15 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 00:14 andrew@cumin1001: START - Cookbook sre.hosts.decommission
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-07-23 ==
== 2021-07-26 ==
* 23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:30 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 23:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 22:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 22:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 22:52 mutante: stashbot quadruple log test
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 22:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'echostore' for release 'staging' .
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 22:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 22:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 21:45 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 21:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 21:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 21:21 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables (duration: 00m 34s)
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 21:20 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@c99c626]: airflow: centralize installation specific airflow Variables
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 21:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 20:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 06:39 moritzm: installing krb5 security updates
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 20:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 19:13 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:11 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 19:09 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 18:51 ryankemper: restarted blazegraph on codfw wdqs2001
* 18:44 ryankemper: Restarted blazegraph on following codfw wdqs nodes: 2007, 2003, and 2002
* 18:39 Amir1: BACC is done
* 18:29 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613235{{!}}Load WikibaseClient from extension.json file instead of php one (T257437 T256228 T88258)]] (duration: 01m 05s)
* 18:21 mutante: testreduce1001 - rm -rf /srv/testreduce and run puppet to re-clone testreduce to it from the scandium branch ([[phab:T257906|T257906]])
* 18:13 ryankemper: restarted blazegraph on 2001
* 17:59 ryankemper: sudo -E cumin -b 10 'A:wdqs-all and not A:wdqs-test and not P<nowiki>{</nowiki>wdqs1003.eqiad.wmnet<nowiki>}</nowiki> and not P<nowiki>{</nowiki>wdqs2001.codfw.wmnet<nowiki>}</nowiki>' 'sudo systemctl restart wdqs-blazegraph.service'
* 17:53 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕑☕ sudo cumin -b10 'wdqs*' "run-puppet-agent --unless-version 1a4ae81"
* 17:52 cdanis@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs.*,name=codfw
* 17:35 cdanis@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs.*,name=codfw
* 17:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 16:57 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 16:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 15:36 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 01m 05s)
* 13:49 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=.*
* 12:29 marostegui: Decrease labsdb1009 weight a bit, as it is lagging again.
* 12:23 XioNoX: remove bogus lo0 IPs from cr3-knams
* 12:21 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
* 12:17 Urbanecm: Stagging at mwdebug1001 again
* 12:02 Urbanecm: Stagging at mwdebug1001 ended, run scap pull to clean changes
* 12:00 Urbanecm: Stagging at mwdebug1001
* 11:49 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|745ff20f53e4914cf6e1717c963419e74b68e693}}: Log ClosedWikiProviders start with info level ([[phab:T258695|T258695]]) (duration: 01m 05s)
* 11:48 marostegui: Deploy MCR schema change on db1145:3314
* 11:36 dcausse: European mid-day backport window done
* 11:31 dcausse@deploy1001: Synchronized php-1.36.0-wmf.1/extensions/Wikibase: [[phab:T258507|T258507]]: Fix bug that causes wrong prefixes in RDF output (duration: 01m 11s)
* 11:18 akosiaris: depool scb in mobileapps/eqiad. [[phab:T218733|T218733]]
* 11:17 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb.*
* 11:13 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T258474|T258474]]: [sdoc] fix entity source base URIs (duration: 01m 07s)
* 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb.*
* 10:27 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=eqiad,service=mobileapps,name=scb*
* 10:25 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1002.*
* 10:24 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,service=mobileapps,name=scb1001.*
* 10:18 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:14 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:11 akosiaris: poole kubernetes in mobileapps/eqiad. [[phab:T218733|T218733]]
* 10:11 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
* 10:06 volans@deploy1001: Finished deploy [debmonitor/deploy@16d0c45]: Release v0.2.6 (duration: 00m 36s)
* 10:06 volans@deploy1001: Started deploy [debmonitor/deploy@16d0c45]: Release v0.2.6
* 10:05 volans@deploy1001: Finished deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6 (duration: 00m 14s)
* 10:05 volans@deploy1001: Started deploy [debmonitor/deploy@44aa1ee]: Release v0.2.6
* 10:04 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 09:51 akosiaris: prepare for pooling kubernetes mobileapps capacity in eqiad. [[phab:T218733|T218733]]
* 09:51 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,service=mobileapps,name=kubernetes.*
* 09:46 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 09:38 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:27 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:24 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 09:19 akosiaris: lower replica count back to 80 for mobileapps. [[phab:T218733|T218733]]
* 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:02 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 08:59 marostegui: transfer --type=xtrabackup from db1117:3322 to db1107 [[phab:T257540|T257540]]
* 08:45 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:42 godog: test librenms poller from netmon2001
* 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:40 XioNoX: remove pim-rp IPs from last routers - [[phab:T257573|T257573]]
* 08:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:29 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1107 from s1 [[phab:T257540|T257540]]', diff saved to https://phabricator.wikimedia.org/P12025 and previous config saved to /var/cache/conftool/dbconfig/20200723-082647-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to move it to m2 [[phab:T257540|T257540]]', diff saved to https://phabricator.wikimedia.org/P12024 and previous config saved to /var/cache/conftool/dbconfig/20200723-081650-marostegui.json
* 05:29 marostegui: Restore labsdb1009's original weight
* 00:24 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (2/2) (duration: 01m 08s)
* 00:22 legoktm@deploy1001: Synchronized php-1.35.0-wmf.41/includes/libs/rdbms/database/Database.php: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (duration: 01m 05s)
* 00:20 legoktm@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
* 00:16 legoktm@deploy1001: Synchronized php-1.36.0-wmf.1/includes/: [[phab:T258664|T258664]]: Revert "Add a new type of database to the installer from extension" (duration: 01m 09s)
* 00:11 legoktm@deploy1001: scap failed: average error rate on 3/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)


== 2020-07-22 ==
== 2021-07-24 ==
* 22:07 cdanis: remove downtime on api.svc.codfw.wmnet [[phab:T258614|T258614]]
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 19:26 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.1 (duration: 01m 03s)
* 19:25 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.1
* 19:15 urbanecm@deploy1001: Finished scap: {{Gerrit|9529cf8d2570bbf6dd1e919c966f5954e39dbd67}}: {{Gerrit|b66ec9143bd96cbf3a20b70f6aa3f2d6d7963bb5}}: OOUI backport; {{Gerrit|93755a6a92923ae390e3a04b19421c8562568d2a}}: i18n changes for OAuth, removal of spam messages (duration: 42m 26s)
* 19:14 ejegg: updated payments-wiki from {{Gerrit|bf91f8adff}} to {{Gerrit|31a3de1130}}
* 19:11 mutante: mw2335 - mw2339 - scap pull
* 18:39 dzahn@cumin1001: conftool action : set/weight=15; selector: name=mw233[5-9].codfw.wmnet
* 18:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw233[6-9].codfw.wmnet
* 18:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw233[6-9].codfw.wmnet
* 18:33 urbanecm@deploy1001: Started scap: {{Gerrit|9529cf8d2570bbf6dd1e919c966f5954e39dbd67}}: {{Gerrit|b66ec9143bd96cbf3a20b70f6aa3f2d6d7963bb5}}: OOUI backport; {{Gerrit|93755a6a92923ae390e3a04b19421c8562568d2a}}: i18n changes for OAuth, removal of spam messages
* 18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2335.codfw.wmnet
* 18:28 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw233[5-9].codfw.wmnet
* 18:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2339.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2338.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2337.codfw.wmnet
* 17:58 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 17:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2335.codfw.wmnet
* 15:31 moritzm: updated stretch installer image to Stretch 9.13 release [[phab:T258407|T258407]]
* 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 15:27 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:52 XioNoX: add accept-data and remove bogus v6 IP from ulsfo sandbox vlan
* 14:43 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 14:35 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:12 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:50 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:49 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:36 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:34 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:20 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:19 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:18 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:16 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 12:36 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:32 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:28 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:20 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:18 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 12:17 akosiaris@cumin1001: conftool action : set/weight=0; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:05 ema: A:cp-text varnish ban ptwikiversity [[phab:T256750|T256750]]
* 12:01 ema: A:cp-text varnish ban frwiktionary [[phab:T256750|T256750]]
* 11:56 ema: A:cp-text varnish ban euwiki [[phab:T256750|T256750]]
* 11:54 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:54 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:52 Urbanecm: EU B&C window done
* 11:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 11:49 ema: A:cp-text force puppet run to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/615446 [[phab:T256750|T256750]]
* 11:48 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 15s)
* 11:42 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614889{{!}}Enable desktop improvements by default for testing group (round 1) (T254227)]] (duration: 01m 05s)
* 11:30 jdrewniak@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 04s)
* 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:30 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 11:28 jdrewniak@deploy1001: Synchronized wmf-config/config: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 05s)
* 11:20 jdrewniak@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 05s)
* 11:18 jdrewniak@deploy1001: Synchronized dblists/desktop-improvements.dblist: Config: [[gerrit:614888{{!}}Enable instrumentation for wikis in the desktop improvements testing group (T254228)]] (duration: 01m 18s)
* 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 11:13 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 10:39 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:24 jbond42: upload prometheus-swagger-exporter_0.3-1+deb10u1 to apt1001 buster repo
* 10:24 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 10:22 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:19 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:12 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 10:08 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 10:04 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:01 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 09:58 marostegui: Deploy MCR schema change on s4 codfw master (lag will appear on codfw) - [[phab:T238966|T238966]]
* 09:55 akosiaris: bump memory in codfw mobileapps another 20% [[phab:T218733|T218733]]
* 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:55 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:52 godog: centrallog1001 lvextend /srv by 130G
* 09:51 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:46 akosiaris: codfw mobileapps kubernetes traffic back to 96% [[phab:T218733|T218733]] again. scb pooled again.
* 09:46 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:43 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:43 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:40 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:40 akosiaris: increase codfw mobileapps kubernetes traffic to 100% [[phab:T218733|T218733]]
* 09:40 akosiaris@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:34 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:27 akosiaris@deploy2001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 09:25 akosiaris: bump memory limits for mobileapps by 25% [[phab:T218733|T218733]]
* 09:25 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:10 jayme: updated docker-report to 0.0.7-1 on deneb
* 09:09 jayme: import docker-report 0.0.7-1 to buster-wikimedia
* 09:06 gehel: restarting blazegraph on all wdqs nodes - new vocabulary
* 08:48 dcausse: restarting blazegraph on wdqs1010 (testing new vocab)
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12017 and previous config saved to /var/cache/conftool/dbconfig/20200722-084613-marostegui.json
* 08:42 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 100% pooled in es4, reduce es1021 to weight 0 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12016 and previous config saved to /var/cache/conftool/dbconfig/20200722-084159-kormat.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12015 and previous config saved to /var/cache/conftool/dbconfig/20200722-083926-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12014 and previous config saved to /var/cache/conftool/dbconfig/20200722-083535-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12013 and previous config saved to /var/cache/conftool/dbconfig/20200722-083140-marostegui.json
* 08:30 kart_: Updated cxserver to 2020-07-20-200559-production ([[phab:T257674|T257674]])
* 08:28 kartik@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:25 kartik@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:25 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12012 and previous config saved to /var/cache/conftool/dbconfig/20200722-082309-marostegui.json
* 08:22 kartik@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12010 and previous config saved to /var/cache/conftool/dbconfig/20200722-082023-marostegui.json
* 08:19 volans@cumin1001: START - Cookbook sre.dns.netbox
* 08:16 akosiaris: increase codfw mobileapps kubernetes traffic to 96% [[phab:T218733|T218733]]. Take #2. Let's see if I can reproduce the weird increases in p99 latencies and figure out their cause
* 08:15 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 08:14 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 75% pooled in es4, reduce es1021 to weight 25 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12009 and previous config saved to /var/cache/conftool/dbconfig/20200722-081457-kormat.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12008 and previous config saved to /var/cache/conftool/dbconfig/20200722-081330-marostegui.json
* 08:12 moritzm: Turnilo switched to CAS
* 08:05 jayme: updated docker-report to 0.0.6-1 on deneb
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 and db1107', diff saved to https://phabricator.wikimedia.org/P12007 and previous config saved to /var/cache/conftool/dbconfig/20200722-075749-marostegui.json
* 07:53 kormat@cumin1001: dbctl commit (dc=all): 'Increase es1020 to 50% pooled in es4 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P12006 and previous config saved to /var/cache/conftool/dbconfig/20200722-075312-kormat.json
* 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1084 to s1, depooled [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P12005 and previous config saved to /var/cache/conftool/dbconfig/20200722-075040-marostegui.json
* 07:49 jayme: import docker-report 0.0.6-1 to buster-wikimedia
* 07:40 jynus: stop db1145 for hw maintenance [[phab:T258249|T258249]]
* 06:47 elukey: update analytics-in4/6 filters on cr1/cr2 eqiad (ref https://gerrit.wikimedia.org/r/c/operations/homer/public/+/614702)
* 06:26 marostegui: Stop MySQL on db1107
* 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 to clone db1084', diff saved to https://phabricator.wikimedia.org/P12003 and previous config saved to /var/cache/conftool/dbconfig/20200722-060432-marostegui.json
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P12002 and previous config saved to /var/cache/conftool/dbconfig/20200722-051607-marostegui.json


== 2020-07-21 ==
== 2021-07-23 ==
* 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump cirrus MLR models to latest (duration: 01m 06s)
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 23:13 Urbanecm: Evening backport window done
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|7a50168d54b5e86834606fb8d7880eb3a923ffd5}}: Updating UploadWizard template: PD-old-70-1923->PD-old-70-expired ([[phab:T258523|T258523]]) (duration: 01m 06s)
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7acc9d966a07d589bb6aed5f801c9e1defc75fe1}}: Enable $wgWatchlistExpiry on testwiki ([[phab:T257506|T257506]]) (duration: 01m 08s)
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.1
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 19:02 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 06s)
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:01 catrope@deploy1001: Synchronized php-1.35.0-wmf.41/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 11s)
* 16:15 effie: enable puppet on mc-gp* hosts
* 18:33 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.1 (duration: 41m 22s)
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 18:27 ejegg: restored new URL for TY page in payments-wiki settings
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:22 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 07s)
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:22 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 18:21 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 12s)
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 18:21 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 18:17 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 17s)
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 18:16 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 18:13 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 05m 32s)
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 18:08 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.1
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:50 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 17:10 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.39 (duration: 16m 25s)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 16:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2 (duration: 04m 54s)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 16:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase (duration: 10m 37s)
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 16:21 longma: 1.36.0-wmf.1 was branched at {{Gerrit|3a1faac3764ecae8dde813bd67a5a8e8f4975a85}} for [[phab:T257969|T257969]]
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 16:16 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 15:10 moritzm: draining restbase1027 for eventual reboot for kernel security update
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 15:09 godog: poweroff ms-be1024 for bbu replacement - [[phab:T257949|T257949]]
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 15:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 15:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 15:01 vgutierrez: show a synthetic warning for traffic using ECDHE-RSA-AES128-SHA - [[phab:T258405|T258405]]
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 15:00 moritzm: draining restbase1026 for eventual reboot for kernel security update
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 14:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 14:51 moritzm: draining restbase1025 for eventual reboot for kernel security update
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 14:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 14:35 akosiaris: decrease codfw mobileapps kubernetes traffic to 72% [[phab:T218733|T218733]]. Weird latency patterns exhibited when 92% was reached. See https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=34&fullscreen&orgId=1&from=1595338489749&to=1595342071227&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 14:35 moritzm: draining restbase1024 for eventual reboot for kernel security update
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11994 and previous config saved to /var/cache/conftool/dbconfig/20200721-143204-marostegui.json
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11993 and previous config saved to /var/cache/conftool/dbconfig/20200721-142634-marostegui.json
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11992 and previous config saved to /var/cache/conftool/dbconfig/20200721-141813-marostegui.json
* 14:16 moritzm: draining restbase1023 for eventual reboot for kernel security update
* 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:03 moritzm: draining restbase1022 for eventual reboot for kernel security update
* 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:55 moritzm: draining restbase1021 for eventual reboot for kernel security update
* 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11991 and previous config saved to /var/cache/conftool/dbconfig/20200721-135028-marostegui.json
* 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:46 moritzm: draining restbase1020 for eventual reboot for kernel security update
* 13:42 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 13:41 akosiaris: increase codfw mobileapps kubernetes traffic to 96% [[phab:T218733|T218733]]
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:15 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:03 moritzm: draining restbase1019 for eventual reboot for kernel security update
* 13:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:55 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 12:54 marostegui: Stop haproxy on dbproxy1012 - [[phab:T255408|T255408]]
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P11988 and previous config saved to /var/cache/conftool/dbconfig/20200721-121302-marostegui.json
* 12:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:25 Urbanecm: EU B&C window done
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b96c7ea35557888c6cec2dd19768c246bff804b}}: Enable botpasswords at checkuserwiki and stewardwiki ([[phab:T258358|T258358]], [[phab:T258355|T258355]]) (duration: 00m 57s)
* 11:11 Urbanecm: Create bot_passwords table at checkuserwiki ([[phab:T258358|T258358]])
* 11:10 Urbanecm: Create bot_passwords table at stewardwiki ([[phab:T258355|T258355]])
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d5bb37c342310be5ca0b0e11a8490703867f4fd}}: Enable Vector opt in preference everywhere ([[phab:T254228|T254228]]) (duration: 00m 57s)
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11987 and previous config saved to /var/cache/conftool/dbconfig/20200721-110854-marostegui.json
* 11:00 effie: enable puppet on  P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11986 and previous config saved to /var/cache/conftool/dbconfig/20200721-105852-marostegui.json
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11985 and previous config saved to /var/cache/conftool/dbconfig/20200721-104546-marostegui.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P11984 and previous config saved to /var/cache/conftool/dbconfig/20200721-103430-marostegui.json
* 10:20 effie: disable puppet on  P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 10:13 effie: enable puppet on on wtp*
* 10:02 marostegui: Analyze revision table on db1119 [[phab:T258480|T258480]]
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T258480|T258480]]', diff saved to https://phabricator.wikimedia.org/P11983 and previous config saved to /var/cache/conftool/dbconfig/20200721-100159-marostegui.json
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb [[phab:T218733|T218733]]
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb
* 09:59 effie: disable puppet on wtp* to merge 613307
* 09:58 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps
* 09:58 akosiaris: increase codfw mobileapps kubernetes traffic to 72.727272% [[phab:T218733|T218733]]
* 09:57 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:44 elukey: add term 'idp' to analytics-in4/6 filters on cr1-eqiad and cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/615160)
* 09:21 kormat@cumin1001: dbctl commit (dc=all): 'Re-pool es1020 at 25% in es4 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11982 and previous config saved to /var/cache/conftool/dbconfig/20200721-092126-kormat.json
* 08:37 akosiaris: increase codfw mobileapps kubernetes traffic to 47% [[phab:T218733|T218733]]
* 08:34 akosiaris@cumin1001: conftool action : set/weight=3; selector: dc=codfw,service=mobileapps,name=scb.*
* 08:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11980 and previous config saved to /var/cache/conftool/dbconfig/20200721-080842-marostegui.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11979 and previous config saved to /var/cache/conftool/dbconfig/20200721-075233-marostegui.json
* 07:49 marostegui: Deploy schema change on db1087, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T256685|T256685]]
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T256685|T256685]]', diff saved to https://phabricator.wikimedia.org/P11978 and previous config saved to /var/cache/conftool/dbconfig/20200721-074843-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11977 and previous config saved to /var/cache/conftool/dbconfig/20200721-073757-marostegui.json
* 07:29 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es4 [[phab:T257847|T257847]] (duration: 00m 57s)
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1020 from es4 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11976 and previous config saved to /var/cache/conftool/dbconfig/20200721-072251-kormat.json
* 07:21 kormat@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 master [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11975 and previous config saved to /var/cache/conftool/dbconfig/20200721-072127-kormat.json
* 07:13 kormat: killing James_F('s script) on mwmaint1002
* 07:06 _joe_: systemctl reset-failed on deneb, the usual known issue with releng image reporting
* 07:03 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es4 [[phab:T257847|T257847]] (duration: 01m 00s)
* 06:59 kormat: Starting es4 failover from es1020 to es1021 [[phab:T257847|T257847]]
* 06:54 kormat@cumin1001: dbctl commit (dc=all): 'Set es1021 to weight 50 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11974 and previous config saved to /var/cache/conftool/dbconfig/20200721-065457-kormat.json
* 06:54 marostegui: Pool db1119 into enwiki with MCR schema change done - [[phab:T238966|T238966]]
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11973 and previous config saved to /var/cache/conftool/dbconfig/20200721-065430-marostegui.json
* 06:27 _joe_: systemctl reset-failed on lists1001, a network interface was failing since 1 month
* 06:26 _joe_: enabling notifications for lists1001
* 06:23 _joe_: systemctl reset-failed on both centrallogs
* 02:43 eileen: civicrm revision changed from {{Gerrit|7f1e7d8e38}} to {{Gerrit|cc5d17fbaf}}, config revision is {{Gerrit|23460676f6}}
* 00:02 ryankemper: Began Elasticsearch reindex job on index `dewiki_content` across [`eqiad`, `codfw`, `cloudelastic`], on `rkemper@mwmaint1002` under tmux session `reindex`. Should complete in <24 hours


== 2020-07-20 ==
== 2021-07-22 ==
* 23:49 eileen: tools revision changed from {{Gerrit|b915d8efbd}} to {{Gerrit|22550f38c5}}
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 23:34 ejegg: updated fundraising CiviCRM from {{Gerrit|8b09c87ce2}} to {{Gerrit|7f1e7d8e38}}
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 23:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/ProofreadPage/ProofreadPage.namespaces.php: {{Gerrit|03ed74f0b9b8f55d01f9112c31f2f6ea17990f9c}}: Add ProofreadPage namespace translation for lij ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 23:06 Urbanecm: run mwscript namespaceDupes.php --wiki=lijwikisource -- fix ([[phab:T257672|T257672]])
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2147774caaa0819f8b5d71cc16bc021d94677702}}: Add English aliases for WS-specific namespaces to lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 22:59 ryankemper@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 613669: cirrussearch: Allow 2 dewiki->content shards/node {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/613669 (duration: 00m 57s)
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 21:53 eileen: tools revision changed from {{Gerrit|40d52a0008}} to {{Gerrit|b915d8efbd}}
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 21:15 sbassett: Revised mitigation deployed for [[phab:T257687|T257687]]
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 20:07 eileen: tools revision changed from {{Gerrit|711d671600}} to {{Gerrit|40d52a0008}}
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 19:10 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 00m 07s)
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 19:10 mforns@deploy1001: Started deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 19:09 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 05m 46s)
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 19:03 mforns@deploy1001: Started deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|df2584f181f08da0e1191f97e619e912e587b48d}}: Switch $wgUrlShortenerDomainsWhitelist --> $wgUrlShortenerAllowedDomains ([[phab:T255491|T255491]]) (duration: 00m 57s)
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dfed4727c6f9e003f9e1949b2995a0cf0ad4f1cc}}: Adding rollbacker group for arzwiki ([[phab:T258100|T258100]]) (duration: 00m 57s)
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee7ac95e16f55e850b318f7354842795e08e0270}}: Change of rollbacker group settings at jawiki ([[phab:T258339|T258339]]) (duration: 00m 57s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 17:36 ejegg: updated payments-wiki settings to point TY page at new URL
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 16:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map (duration: 00m 25s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 16:31 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 16:27 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]. Take #2
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:27 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 15:59 elukey: restart airflow-webserver/scheduler to pick up TLS to mysql settings
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 15:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:17 hnowlan: draining and restarting sessionstore2002
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:13 jynus: dropping and recreating nagios@localhost users on all m1 servers
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 15:09 hnowlan: draining and restarting sessionstore2001
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 15:09 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 15:08 moritzm: draining restbase2023 for eventual reboot for kernel security update
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 15:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:56 moritzm: draining restbase2022 for eventual reboot for kernel security update
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 14:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 14:52 hnowlan: draining and restarting sessionstore1003
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 14:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 14:47 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:47 moritzm: draining restbase2021 for eventual reboot for kernel security update
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 14:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 14:36 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}} (duration: 03m 50s)
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 14:34 hnowlan: starting drain and restart of sessionstore hosts for new kernel
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 14:32 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}}
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:26 moritzm: draining restbase2020 for eventual reboot for kernel security update
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:23 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:14 moritzm: draining restbase2019 for eventual reboot for kernel security update
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:08 ema: lvs101[34] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:07 ema: lvs1016 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 13:59 ema: lvs300[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:57 ema: lvs3007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:50 ema: lvs500[12] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:48 moritzm: draining restbase2018 for eventual reboot for kernel security update
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 13:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 13:47 ema: lvs5003 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 13:44 ema: lvs200[78] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 13:42 ema: lvs2010 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 13:31 ema: lvs400[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:36 Lucas_WMDE: EU backport+config window done
* 13:27 moritzm: draining restbase2017 for eventual reboot for kernel security update
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 13:24 ema: lvs4007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 13:09 moritzm: draining restbase2016 for eventual reboot for kernel security update
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 13:07 moritzm: reset broken ifup systemd states on puppetdb* hosts
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 12:59 Urbanecm: creating arywiki ([[phab:T257674|T257674]]), lijwikisource ([[phab:T257672|T257672]]), sysop_itwiki ([[phab:T256545|T256545]]) done
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 12:59 moritzm: draining restbase2015 for eventual reboot for kernel security update
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 12:56 Urbanecm: Create Daimona Eaytoy at sysop_itwiki ([[phab:T256545|T256545]])
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 12:50 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 12:48 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating sysop_itwiki ([[phab:T256545|T256545]])
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 12:46 urbanecm@deploy1001: Synchronized dblists: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 12:40 moritzm: draining restbase2014 for eventual reboot for kernel security update
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 12:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lijwikisource ([[phab:T257672|T257672]])
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 12:30 urbanecm@deploy1001: Synchronized dblists: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 56s)
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 12:28 urbanecm@deploy1001: Synchronized dblists/rtl.dblist: Add arywiki to rtl.dblist ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:27 moritzm: draining restbase2013 for eventual reboot for kernel security update
* 12:27 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 12:21 urbanecm@deploy1001: Synchronized langlist: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 12:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:17 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arywiki ([[phab:T257674|T257674]])
* 12:16 urbanecm@deploy1001: Synchronized dblists: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:02 moritzm: installing qemu security updates on buster
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|946bf3d239f278b4e099f5dec676f5e2be61d8ca}}: Update brwikimedia logo and add upscaled versions (config) ([[phab:T257925|T257925]]) (duration: 00m 57s)
* 11:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 11:49 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/bnwikimedia.png'
* 11:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f7560b6061dd3a60ccf56c916ebf70a3f104bea7}}: Update brwikimedia logo and add upscaled versions ([[phab:T257925|T257925]]) (duration: 00m 56s)
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b97a06fa2e9a06c251a9c1fd2ddd9beec01a683}}: Set $wgUrlShortenerAllowedDomains for all wikis ([[phab:T258134|T258134]]) (duration: 00m 57s)
* 11:42 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c12f1dee6b9888849c64312c2a4fd65ecbd4091e}}: Remove wgPopupsPageBlacklist config setting ([[phab:T254676|T254676]]) (duration: 00m 57s)
* 11:35 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript createAndPromote.php testwikidatawiki --custom-groups=interface-admin --force 'Lucas Werkmeister (WMDE)'
* 11:34 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 11:25 Urbanecm: mwscript namespaceDupes.php --wiki=kowikiquote  --fix ([[phab:T255031|T255031]])
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3719668511231589b4fc6a723ccdfa772068ad5f}}: Add NamespaceAliases for kowikiquote ([[phab:T255031|T255031]]) (duration: 00m 57s)
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc5671a90c65b66989e470fc41225986b2ec9fb5}}: Add media.farsnews.ir to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T253800|T253800]]) (duration: 00m 57s)
* 11:18 Urbanecm: Run mwscript updateCollation.php --wiki=bswiktionary --previous-collation=uppercase in a tmux session at mwmaint1002 ([[phab:T258346|T258346]])
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0c784784d75c2bbfb570495a6a097d4c44cbe6b3}}: Set $wgCategoryCollation to uca-bs-u-kn on Bosnian Wiktionary ([[phab:T258346|T258346]]) (duration: 00m 58s)
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6830723b0ad5031e67062ba838f09cd07c2b97a1}}: Convert ukwikisource ns:250 and ns:251 to have subpages ([[phab:T255930|T255930]]) (duration: 00m 57s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7a6215d06aff6cb0a75701292d8147f006d9e4}}: Create closer group at itwikinews ([[phab:T257927|T257927]]) (duration: 00m 57s)
* 10:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:48 moritzm: rebooting releases* hosts for kernel security update
* 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 56s)
* 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 59s)
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114', diff saved to https://phabricator.wikimedia.org/P11962 and previous config saved to /var/cache/conftool/dbconfig/20200720-103058-marostegui.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11961 and previous config saved to /var/cache/conftool/dbconfig/20200720-094609-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11960 and previous config saved to /var/cache/conftool/dbconfig/20200720-093154-marostegui.json
* 09:25 godog: update compiler facts
* 09:17 jayme: updating envoyproxy to 1.14.4-1 on all eqiad hosts
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11959 and previous config saved to /var/cache/conftool/dbconfig/20200720-091119-marostegui.json
* 09:04 jayme: updating envoyproxy to 1.14.4-1 on all codfw hosts
* 07:54 moritzm: installing libopenmpt security updates
* 07:51 jayme: updating envoyproxy to 1.14.4-1 on all non mw and restbase hosts
* 07:29 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 - [[phab:T255408|T255408]]
* 07:19 marostegui: Drop non used reviewdb database - [[phab:T255715|T255715]]
* 06:55 elukey: restart matomo1002's mariadb to pick up new TLS settings
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P11958 and previous config saved to /var/cache/conftool/dbconfig/20200720-065438-marostegui.json
* 06:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score/includes/Score.php: reverting Reedy's temporary patch for hardcoding the lilypond version (duration: 00m 57s)
* 06:07 tstarling@deploy1001: Finished scap: fixing missing message from previous sync-dir (duration: 29m 57s)
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11957 and previous config saved to /var/cache/conftool/dbconfig/20200720-055614-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11956 and previous config saved to /var/cache/conftool/dbconfig/20200720-054747-marostegui.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11955 and previous config saved to /var/cache/conftool/dbconfig/20200720-053816-marostegui.json
* 05:37 tstarling@deploy1001: Started scap: fixing missing message from previous sync-dir
* 05:30 tstarling@deploy1001: scap sync-l10n completed (1.35.0-wmf.41) (duration: 02m 44s)
* 05:25 marostegui: Deploy MCR schema change on enwiki on db1119 - [[phab:T238966|T238966]]
* 05:24 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond with better error message (duration: 00m 57s)
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11953 and previous config saved to /var/cache/conftool/dbconfig/20200720-051846-marostegui.json
* 05:18 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score: better error message for disabling of Score (duration: 01m 10s)


== 2020-07-19 ==
== 2021-07-21 ==
* 19:16 marostegui: Upgrade and reboot db1085 [[phab:T258360|T258360]]
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 18:57 marostegui: Start mysql on db1082 [[phab:T258336|T258336]]
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:51 marostegui: Upgrade and reboot db1082 [[phab:T258336|T258336]]
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 18:45 cdanis@cumin1001: dbctl commit (dc=all): 'db1085 also crashed', diff saved to https://phabricator.wikimedia.org/P11952 and previous config saved to /var/cache/conftool/dbconfig/20200719-184511-cdanis.json
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:06 Urbanecm: Run mwscript emptyUserGroup.php --wiki=testwiki contestadmin ([[phab:T256555|T256555]])
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-07-18 ==
== 2021-07-20 ==
* 21:41 shdubsh: restart logstash on logstash200[456]
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 21:14 shdubsh: bounce logstash on logstash1007
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 21:10 shdubsh: bounce logstash on logstash1008
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 21:06 shdubsh: bounce logstash on logstash1009
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 20:52 marostegui: Due to db1082 crash there will be replication lag on s5 on labsdb hosts - [[phab:T258336|T258336]]
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:37 cdanis@cumin1001: dbctl commit (dc=all): 'depool db1082, it crashed', diff saved to https://phabricator.wikimedia.org/P11951 and previous config saved to /var/cache/conftool/dbconfig/20200718-203704-cdanis.json
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 00:13 dpifke: Performing one-time expiration of ArcLamp files older than 40 days (normal retention is 45 days), to solve disk space issue until either Ganeti issue is solved or compressed logfile support is merged.
* 17:06 rzl: enabled puppet on A:mw
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-07-17 ==
== 2021-07-19 ==
* 21:16 dpifke: Removing MongoDB packages and data from webperf1002.
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 17:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@a5d2fd3]: (no justification provided) (duration: 00m 05s)
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 17:38 dpifke@deploy1001: Started deploy [performance/arc-lamp@a5d2fd3]: (no justification provided)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:53 akosiaris: powercycle kubernetes2002
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P11944 and previous config saved to /var/cache/conftool/dbconfig/20200717-122400-marostegui.json
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11941 and previous config saved to /var/cache/conftool/dbconfig/20200717-120126-marostegui.json
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11940 and previous config saved to /var/cache/conftool/dbconfig/20200717-115155-marostegui.json
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11939 and previous config saved to /var/cache/conftool/dbconfig/20200717-113800-marostegui.json
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11938 and previous config saved to /var/cache/conftool/dbconfig/20200717-113050-marostegui.json
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P11937 and previous config saved to /var/cache/conftool/dbconfig/20200717-112413-marostegui.json
* 18:46 brennen: gerrit1001: restarting gerrit
* 09:15 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 09:12 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 08:48 moritzm: imported prometheus-atlas-exporter 1.0+git20191204.ffafab7-2 to buster-wikimedia [[phab:T247967|T247967]]
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 08:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 08:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11936 and previous config saved to /var/cache/conftool/dbconfig/20200717-075124-marostegui.json
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P11935 and previous config saved to /var/cache/conftool/dbconfig/20200717-074335-marostegui.json
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 07:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 07:30 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 06:30 XioNoX: rename msw1-codfw interface range
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 06:28 XioNoX: rename msw1-eqiad interface range
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P11934 and previous config saved to /var/cache/conftool/dbconfig/20200717-044748-marostegui.json
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092', diff saved to https://phabricator.wikimedia.org/P11933 and previous config saved to /var/cache/conftool/dbconfig/20200717-044658-marostegui.json
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2020-07-16 ==
== 2021-07-16 ==
* 22:15 mutante: testreduce1001 manually git clone 'scandium' branch of integration/visualdiff into /srv/visualdiff ([[phab:T257906|T257906]])
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:54 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3 (duration: 01m 49s)
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:52 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2 (duration: 01m 33s)
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 21:41 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:40 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 (duration: 01m 01s)
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 21:39 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 21:08 cstone: payments-wiki revision changed from {{Gerrit|91852dbc9b}} to {{Gerrit|bf91f8adff}}
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client error logging on Catalan Wikipedia ([[phab:T258073|T258073]]) (duration: 00m 57s)
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 19:32 sbassett: Deployed mitigations for [[phab:T257687|T257687]]
* 15:48 vgutierrez: restart pybal on lvs2010
* 19:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248418|T248418]] TimedMediaHandler: Make videojs the only player on all group0 (duration: 00m 57s)
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:54 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 18:53 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 18:50 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 18:49 addshore: deployment windows finished with
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 18:46 addshore@deploy1001: Synchronized wmf-config/extension-list: [[gerrit:611393]] extension-list: Load WikibaseClient via JSON (duration: 00m 56s)
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 18:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 2/2 (duration: 00m 56s)
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 1/2 (duration: 00m 56s)
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 18:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613165]] [[phab:T138104|T138104]] Wikibase: stop setting wmgWikibaseTmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:23 addshore@deploy1001: Synchronized wmf-config/config/incubatorwiki.yaml: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT2/2 (duration: 00m 57s)
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 18:22 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT1/2 (duration: 00m 56s)
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 18:20 addshore@deploy1001: Synchronized wmf-config/config/nlwikimedia.yaml: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT2/2 (duration: 00m 57s)
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 18:18 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT1/2 (duration: 00m 56s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 18:14 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613164]] [[phab:T138104|T138104]] Wikibase: stop setting wgWBRepoSettings tmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 18:12 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613192]] [[phab:T246420|T246420]] Enable limited-width layout for Modern Vector (duration: 00m 56s)
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 18:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612870]] [[phab:T246977|T246977]] Disable affinity quicksurveys for the following wikis (duration: 00m 57s)
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 17:54 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 17:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 17:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 17:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 17:49 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 17:17 XioNoX: msw1-eqiad delete unused VC-ports
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 17:05 XioNoX: msw1-codfw - replace member-range with list of individual interfaces
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 16:45 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613173{{!}}Re add OtherProjectsSidebarGenerator::buildProjectLinkSidebarFromItemId (T258184)]] (duration: 01m 02s)
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:11 effie: reboot rdb1009 - [[phab:T254990|T254990]]
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 16:06 effie: Reboot rdb1010 - [[phab:T254990|T254990]]
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 15:51 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613170{{!}}Revert "Revert "Removes OtherProjectsSidebar hook"" (T258184)]] (duration: 01m 02s)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 15:40 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 15:15 akosiaris: lower codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]. Will open up task for it
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:15 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 15:07 XioNoX: repool eqsin - [[phab:T257154|T257154]]
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 15:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 14:54 XioNoX: load config on cr3-eqsin - [[phab:T257154|T257154]]
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 14:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613167{{!}}Avoid trying to register wikibase.Site twice (T258065)]] (duration: 01m 03s)
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:31 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 14:12 moritzm: rebooting webperf hosts in eqiad for kernel update
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 14:09 XioNoX: upgrade junos on cr3-eqsin - [[phab:T257154|T257154]]
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 14:03 jayme: published image docker-registry.discovery.wmnet/envoy:1.14.4-1
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 13:47 XioNoX: remove nonstop-bridging from asw1-eqsin
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 13:36 XioNoX: power-off cr3-eqsin - [[phab:T257154|T257154]]
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 13:36 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:35 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:30 XioNoX: deactivate BGP groups IX/Transit/PyBal on cr3-eqsin - [[phab:T257154|T257154]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:27 moritzm: installing an-tool1008
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:23 XioNoX: depool eqsin for cr3 replacement - [[phab:T257154|T257154]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 13:13 volans@deploy1001: Finished deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin (duration: 01m 27s)
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 13:12 volans@deploy1001: Started deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin
* 13:04 kormat: restarting tendril to pick up new mariadb config [[phab:T257816|T257816]]
* 13:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.41
* 13:02 akosiaris: increase codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]
* 13:01 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092', diff saved to https://phabricator.wikimedia.org/P11926 and previous config saved to /var/cache/conftool/dbconfig/20200716-125643-marostegui.json
* 12:56 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 04m 32s)
* 12:52 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:42 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 03m 42s)
* 12:38 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:36 akosiaris@cumin1001: conftool action : set/weight=50; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5% [[phab:T218733|T218733]]
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5%
* 12:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:22 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:08 jayme: updated envoyproxy to 1.14.4-1 on mw-canary and restbase-canary
* 11:44 XioNoX: remove BGP to AS396253 in eqdfw (peer left the IX)
* 11:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258134|T258134]] Fix config variables regex concatenation (duration: 01m 05s)
* 11:23 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[phab:T254315|T254315]] [[gerrit:612670]] Wikibase: remove wmgWikibaseLocalEntitySourceName (duration: 01m 05s)
* 11:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254315|T254315]] [[phab:T257266|T257266]] [[gerrit:609988]] Wikidata client wikis: Define entity sources configuration (take 3) (duration: 01m 08s)
* 10:17 jbond42: upgrade to hiera5
* 10:08 jbond42: disable puppet for hiera5 deployment
* 09:37 jayme: updated envoyproxy to 1.14.4-1 on mw1325.eqiad.wmnet and restbase1026.eqiad.wmnet
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 moritzm: rebooting flowspec1001
* 08:52 jayme: updated envoyproxy to 1.14.4-1 on mwdebug1001.eqiad.wmnet
* 08:41 moritzm: installing sqlite3 security updates
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P11924 and previous config saved to /var/cache/conftool/dbconfig/20200716-083954-marostegui.json
* 08:35 XioNoX: Remove PIM/IGMP related CR stanza (acls) - [[phab:T257573|T257573]]
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 moritzm: installing dbus security updates
* 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 XioNoX: remove igmp-snooping from access switches - [[phab:T257573|T257573]]
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 moritzm: installing python-urllib3 security updates
* 08:15 XioNoX: remove PIM config from eqord/eqdfw/knams routers - [[phab:T257573|T257573]]
* 08:14 XioNoX: remove PIM config from eqiad routers - [[phab:T257573|T257573]]
* 08:11 XioNoX: remove PIM config from esams routers - [[phab:T257573|T257573]]
* 08:09 XioNoX: remove PIM config from eqsin routers - [[phab:T257573|T257573]]
* 08:08 jbond42: update mail delivery for phabricator to use phabricator.discovery.wmnet cname
* 08:07 XioNoX: remove PIM config from codfw routers - [[phab:T257573|T257573]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P11923 and previous config saved to /var/cache/conftool/dbconfig/20200716-080613-marostegui.json
* 08:03 XioNoX: remove PIM config from ulsfo routers - [[phab:T257573|T257573]]
* 07:41 jayme: imported envoyproxy_1.14.4-1 to stretch-wikimedia
* 07:31 jayme: imported envoyproxy_1.14.4-1 to buster-wikimedia
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1131', diff saved to https://phabricator.wikimedia.org/P11922 and previous config saved to /var/cache/conftool/dbconfig/20200716-072838-marostegui.json
* 07:25 marostegui: Drop database reviewdb-test [[phab:T255715|T255715]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11921 and previous config saved to /var/cache/conftool/dbconfig/20200716-070331-marostegui.json
* 06:40 XioNoX: remove peering with AS8403 in eqsin (peer left the IX)
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11920 and previous config saved to /var/cache/conftool/dbconfig/20200716-051342-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11919 and previous config saved to /var/cache/conftool/dbconfig/20200716-051109-marostegui.json


== 2020-07-15 ==
== 2021-07-15 ==
* 23:54 eileen: tools revision changed from {{Gerrit|7b6018a16e}} to {{Gerrit|711d671600}}
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:50 eileen: process-control config revision is {{Gerrit|1fc4a9686d}}
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 23:04 bd808: tools.admin Removed valhallasw from maintainers ([[phab:T255697|T255697]])
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 23:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 22:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 22:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 22:27 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 22:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 18:16 brennen: restarting jenkins for upgrade
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 18:00 mutante: DNS - new language 'avk' has been added - This language is called Kotava and is "a proposed international auxiliary language (IAL) that focuses especially on the principle of cultural neutrality". Learn more at https://en.wikipedia.org/wiki/Kotava
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:32 mutante: puppetmaster - revoking cert for planet.discovery.wmnet, add planet.wikimedia.org, remove planet.svc records, remove specific and outdated hostnames ([[phab:T257840|T257840]])
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 16:11 moritzm: uploaded jenkins 2.235.2 to thirdparty/ci for stretch/buster [[phab:T257614|T257614]]
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 15:20 moritzm: rebooting webperf* hosts for kernel update
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:58 addshore@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/repo: [[gerrit:612723]] Stop checking if WikibaseLib is loaded [[phab:T258062|T258062]] (already on mwmaint1002) (duration: 01m 08s)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 14:51 addshore: pulled https://gerrit.wikimedia.org/r/612723 onto mwmaint 1002 ahead of syncing everywhere (and CI finishing)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:37 ema: A:cp: upgrade purged to 0.17 [[phab:T257573|T257573]]
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 14:30 ema: upload purged 0.17 to buster-wikimedia [[phab:T257573|T257573]]
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 04s)
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 14:26 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 05s)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:25 gehel: repooling wdqs1006 - catched up on lag
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:12 akosiaris: increase codfw mobileapps kubernetes traffic to 2% [[phab:T218733|T218733]]
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 14:10 akosiaris@cumin1001: conftool action : set/weight=132; selector: dc=codfw,service=mobileapps,name=scb.*
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 13:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258056|T258056]] Add temporary fix to ensure array is passed to array_map() (duration: 01m 08s)
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:54 akosiaris: pool kubernetes nodes for mobileapps in codfw
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:53 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 13:53 akosiaris@cumin1001: conftool action : set/weight=264; selector: dc=codfw,service=mobileapps,name=scb.*
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 13:51 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 13:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.41 (duration: 01m 05s)
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.41
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:59 addshore: deploy window closed / done :)
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609987]] Commons: Define entity sources configuration (take 2) [[phab:T254315|T254315]] (duration: 01m 03s)
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612668]] Wikibase test: Client local entity sources are always testwikidata [[phab:T254315|T254315]] (duration: 01m 05s)
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:27 addshore@deploy1001: Synchronized wmf-config: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT2/2 (duration: 01m 06s)
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:26 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT1/2 (duration: 01m 05s)
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 11:16 XioNoX: remove as-path prepending in esams
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 11:11 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: LABS [[gerrit:612667]] Wikibase labs: All client "local" entity sources are wikidata [[phab:T254315|T254315]] (duration: 01m 04s)
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:612666]] Wikibase: Split localEntitySourceName config for repo and client [[phab:T254315|T254315]] (duration: 01m 16s)
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:05 XioNoX: re-enable ping offload in esams
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:56 XioNoX: disable ping offload in esams
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:55 XioNoX: re-enable ping offload in codfw
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:45 XioNoX: disable ping offload in codfw
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:44 XioNoX: re-enable ping offload in eqiad
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 10:31 XioNoX: disable ping offload in eqiad
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 10:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:30 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11916 and previous config saved to /var/cache/conftool/dbconfig/20200715-102605-marostegui.json
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:20 jayme: updating python3-docker-report to 0.0.5-1 on deneb
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11915 and previous config saved to /var/cache/conftool/dbconfig/20200715-100855-marostegui.json
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 10:07 jayme: imported docker-report_0.0.5-1 to buster-wikimedia
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 09:48 marostegui: Deploy schema change on s8 codfw master, lag will appear on codfw [[phab:T256685|T256685]]
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11914 and previous config saved to /var/cache/conftool/dbconfig/20200715-094226-marostegui.json
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 09:22 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 09:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 09:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:19 akosiaris: deploy mobileapps in kubernetes to talk HTTPS to the mw API
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet