You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(akosiaris: sudo cumin -b 1 -s 120 'cp500[2,3,5,6].eqsin.wmnet' 'systemctl restart varnish-frontend.service')
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(152 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-02-14 ==
== 2021-08-03 ==
* 13:13 akosiaris: sudo cumin -b 1 -s 120 'cp500[2,3,5,6].eqsin.wmnet' 'systemctl restart varnish-frontend.service'
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:10 _joe_: restarted varnish-fe on cp5004
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 akosiaris: restart varnish-fe on cp5001
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 09:27 joal@deploy1001: Finished deploy [analytics/refinery@dd5f947] (thin): Hotfix analytics deployment - THIN [analytics/refinery@dd5f947] (duration: 00m 06s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 joal@deploy1001: Started deploy [analytics/refinery@dd5f947] (thin): Hotfix analytics deployment - THIN [analytics/refinery@dd5f947]
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:27 joal@deploy1001: Finished deploy [analytics/refinery@dd5f947]: Hotfix analytics deployment [analytics/refinery@dd5f947] (duration: 12m 52s)
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 09:14 joal@deploy1001: Started deploy [analytics/refinery@dd5f947]: Hotfix analytics deployment [analytics/refinery@dd5f947]
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2021-02-13 ==
== 2021-08-02 ==
* 03:23 ryankemper: Depooled `wdqs1006` to catch up on lag
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 03:23 ryankemper: Restarted blazegraph on `wdqs1006`
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 01:30 crusnov@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mwdebug1002.eqiad.wmnet
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 01:00 crusnov@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:55 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 00:49 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1283.eqiad.wmnet
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 00:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 00:30 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 21:31 tzatziki: removing 1 file for legal compliance
* 00:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host mwdebug1002.eqiad.wmnet
* 21:16 tzatziki: removing 7 files for legal compliance
* 00:26 mutante: ganeti - attempting to recreate VM mwdebug1002 with cookbook that wsa previously deleted manually ([[phab:T274689|T274689]] [[phab:T274023|T274023]])
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 00:25 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host mwdebug1002.eqiad.wmnet
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:08 mutante: ganeti1011 - manually deleting VM mwdebug1002 - [[phab:T274689|T274689]] [[phab:T274023|T274023]]
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2021-02-12 ==
== 2021-07-31 ==
* 23:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1348.eqiad.wmnet
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:43 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1283.eqiad.wmnet
* 23:42 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1282.eqiad.wmnet
* 23:42 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1348.eqiad.wmnet
* 23:41 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
* 23:41 legoktm@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1221.eqiad.wmnet
* 23:39 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1221.eqiad.wmnet
* 23:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1281.eqiad.wmnet
* 23:26 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:24 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 23:14 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 23:02 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 22:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
* 22:49 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
* 22:48 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1284.eqiad.wmnet with reason: REIMAGE
* 22:47 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1283.eqiad.wmnet with reason: REIMAGE
* 22:47 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
* 22:45 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1282.eqiad.wmnet with reason: REIMAGE
* 22:44 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
* 22:42 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1281.eqiad.wmnet with reason: REIMAGE
* 22:32 krinkle@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: {{Gerrit|Idc385de0}} cleanup (duration: 05m 14s)
* 22:15 krinkle@deploy1001: Synchronized wmf-config/etcd.php: {{Gerrit|b3447343a}} cleanup (duration: 05m 20s)
* 22:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
* 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1348.eqiad.wmnet with reason: REIMAGE
* 21:26 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
* 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
* 20:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1357.eqiad.wmnet with reason: REIMAGE
* 20:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1356.eqiad.wmnet with reason: REIMAGE
* 20:36 mutante: mwdebug1003 now on buster - mwdebug1002 rebooting and reimaging to buster
* 20:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
* 20:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1002.eqiad.wmnet with reason: OS upgrade
* 20:32 mutante: mw1353, mw1358 - scap pull, repooled
* 20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1353.eqiad.wmnet
* 20:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1358.eqiad.wmnet
* 20:17 mutante: mwdebug2001 - restarted memcached
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1358.eqiad.wmnet
* 20:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1353.eqiad.wmnet
* 19:56 mutante: mwdebug2002 - restart memcached
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1358.eqiad.wmnet with reason: OS upgrade
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
* 19:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1353.eqiad.wmnet with reason: OS upgrade
* 19:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
* 19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug1003.eqiad.wmnet with reason: OS upgrade
* 19:43 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back commonswiki to 1.36.0-wmf.27 due to [[phab:T274589|T274589]]
* 19:42 mutante: mwdebug2001 now on buster - mwdebug1003 rebooting and reimaging to stretch
* 19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2 (duration: 00m 06s)
* 19:38 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2] (thin): Fix for mediarequest per file cassandra job - 2
* 19:38 milimetric@deploy1001: Finished deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2 (duration: 11m 01s)
* 19:34 twentyafterfour: Train status: Rolling back commonswiki to wmf.27 due to [[phab:T274589|T274589]] (refs [[phab:T271344|T271344]])
* 19:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
* 19:28 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
* 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 19:27 milimetric@deploy1001: Started deploy [analytics/refinery@e0c09a2]: Fix for mediarequest per file cassandra job - 2
* 19:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1358.eqiad.wmnet with reason: REIMAGE
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1353.eqiad.wmnet with reason: REIMAGE
* 19:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1015.eqiad.wmnet with reason: REIMAGE
* 19:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 19:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1013.eqiad.wmnet with reason: REIMAGE
* 19:18 milimetric@deploy1001: Finished deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job (duration: 11m 58s)
* 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 19:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 19:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
* 19:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 19:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1011.eqiad.wmnet with reason: REIMAGE
* 19:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1010.eqiad.wmnet with reason: REIMAGE
* 19:06 milimetric@deploy1001: Started deploy [analytics/refinery@366962f]: Fix for mediarequest per file cassandra job
* 19:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
* 19:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mwdebug2001.codfw.wmnet with reason: OS upgrade
* 19:02 mutante: rebooting and reimaging mwdebug2001 to buster [[phab:T274023|T274023]]
* 18:35 mutante: mwdebug2002 now a buster VM; you can find a .tar.gz in your home dir with the contents of your previous home
* 18:30 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False (duration: 03m 10s)
* 18:27 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@98264b8]: airflow: review and correct usage of catchup=False
* 17:33 elukey@cumin1001: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 17:23 bblack: cp*: re-enabling puppet after successful agent run on one host as a test!
* 17:13 bblack: cp*: disable puppet ahead of https://gerrit.wikimedia.org/r/c/operations/puppet/+/663845
* 17:08 elukey@cumin1001: START - Cookbook sre.presto.reboot-workers for Presto analytics cluster: Reboot Presto nodes - elukey@cumin1001
* 17:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
* 16:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
* 16:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
* 16:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
* 16:38 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
* 16:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
* 16:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
* 16:12 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 04m 05s)
* 16:11 hnowlan: joining maps2007 to cassandra cluster
* 16:08 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (hadoop-test): Fix for data quality alarms after BigTop migration TEST [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 16:08 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 00m 06s)
* 16:07 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297] (thin): Fix for data quality alarms after BigTop migration THIN [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 16:07 mforns@deploy1001: Finished deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5] (duration: 38m 56s)
* 15:28 mforns@deploy1001: Started deploy [analytics/refinery@9cd1297]: Fix for data quality alarms after BigTop migration [analytics/refinery@9cd129764edbac04b192c922ec0a975bc47455a5]
* 15:22 herron: rolling reboot of alert[12]001 hosts for updates
* 15:16 elukey: roll restart druid broker on druid-public to pick up new settings
* 14:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1022.eqiad.wmnet
* 14:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1022.eqiad.wmnet
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1090.eqiad.wmnet
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1089.eqiad.wmnet
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3065.esams.wmnet
* 14:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3064.esams.wmnet
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3065.esams.wmnet
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3064.esams.wmnet
* 13:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1090.eqiad.wmnet
* 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1089.eqiad.wmnet
* 13:10 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1005.eqiad.wmnet
* 12:12 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
* 12:11 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps2007.codfw.wmnet with reason: Resyncing database
* 11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2042.codfw.wmnet
* 11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2041.codfw.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2042.codfw.wmnet
* 11:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2041.codfw.wmnet
* 11:27 moritzm: installing emacs updates from buster point release
* 11:25 moritzm: installing device-tree-compiler updates from buster point release
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1088.eqiad.wmnet
* 11:22 moritzm: installing node-ini security updates
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1087.eqiad.wmnet
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3062.esams.wmnet
* 11:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3063.esams.wmnet
* 11:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 11:14 moritzm: installing golang-1.11 security updates
* 11:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: REIMAGE
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3063.esams.wmnet
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3062.esams.wmnet
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1088.eqiad.wmnet
* 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1087.eqiad.wmnet
* 11:10 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 100%', diff saved to https://phabricator.wikimedia.org/P14337 and previous config saved to /var/cache/conftool/dbconfig/20210212-111010-jynus.json
* 11:06 moritzm: installing xcftools security updates
* 10:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2039.codfw.wmnet
* 10:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2040.codfw.wmnet
* 10:50 legoktm: repooled registry1002 after revert
* 10:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2040.codfw.wmnet
* 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2039.codfw.wmnet
* 10:39 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 75%', diff saved to https://phabricator.wikimedia.org/P14336 and previous config saved to /var/cache/conftool/dbconfig/20210212-103921-jynus.json
* 10:24 moritzm: installing wireshark security updates for stretch
* 10:22 legoktm: depooled registry1002 while fixing/debugging nginx config
* 10:22 Urbanecm: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Victorgrigas . # [[phab:T274608|T274608]]
* 10:18 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 50%', diff saved to https://phabricator.wikimedia.org/P14335 and previous config saved to /var/cache/conftool/dbconfig/20210212-101814-jynus.json
* 10:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
* 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4032.ulsfo.wmnet
* 10:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4026.ulsfo.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1086.eqiad.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3061.esams.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3060.esams.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1085.eqiad.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2038.codfw.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5012.eqsin.wmnet
* 10:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2037.codfw.wmnet
* 10:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5006.eqsin.wmnet
* 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
* 09:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5006.eqsin.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5012.eqsin.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4026.ulsfo.wmnet
* 09:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4032.ulsfo.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3061.esams.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3060.esams.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2038.codfw.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2037.codfw.wmnet
* 09:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1086.eqiad.wmnet
* 09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1085.eqiad.wmnet
* 09:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
* 09:45 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 30%', diff saved to https://phabricator.wikimedia.org/P14334 and previous config saved to /var/cache/conftool/dbconfig/20210212-094520-jynus.json
* 09:32 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 20%', diff saved to https://phabricator.wikimedia.org/P14333 and previous config saved to /var/cache/conftool/dbconfig/20210212-093211-jynus.json
* 09:31 moritzm: installing node-y18n security updates
* 08:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
* 08:29 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: REIMAGE
* 08:25 jynus@cumin1001: dbctl commit (dc=all): 'Increase db1163 traffic to 10%', diff saved to https://phabricator.wikimedia.org/P14331 and previous config saved to /var/cache/conftool/dbconfig/20210212-082526-jynus.json
* 08:15 moritzm: reimaging bast2002 to buster
* 07:54 elukey: roll restart of druid brokers on druid-public - locked after scheduled datasource deletion
* 03:36 krinkle@deploy1001: Finished deploy [integration/docroot@3c943ba]: {{Gerrit|I89e1ec881}} (duration: 00m 08s)
* 03:36 krinkle@deploy1001: Started deploy [integration/docroot@3c943ba]: {{Gerrit|I89e1ec881}}
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1329.eqiad.wmnet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1331.eqiad.wmnet
* 01:21 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1332.eqiad.wmnet
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1332.eqiad.wmnet
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1331.eqiad.wmnet
* 01:08 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1330.eqiad.wmnet
* 01:07 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1329.eqiad.wmnet
* 01:06 Urbanecm: Evening B&C done
* 01:04 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|389f7f1fdc9ad4a0c163ccfe1d80f2aaec7f8038}}: Enable DiscussionTools Reply Tool A/B test ([[phab:T273554|T273554]]) (duration: 01m 08s)
* 01:02 urbanecm@deploy1001: sync-file aborted: {{Gerrit|389f7f1fdc9ad4a0c163ccfe1d80f2aaec7f8038}}: Enable DiscussionTools Reply Tool A/B test (duration: 00m 48s)
* 01:01 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.27/extensions/VisualEditor/: {{Gerrit|c86cd00076c9f1857f4bafb04a15640ad66da863}}: {{Gerrit|de4a562d3baec77c85bfa05ba59778b882a6f9d2}}: VE backports ([[phab:T273096|T273096]]) (duration: 01m 15s)
* 00:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d92ed15c51d57f43bad054d0469f54848b84d6a}}: Add import sources for zh_yuewiki ([[phab:T274597|T274597]]) (duration: 01m 13s)
* 00:34 foks: removing 2 files for legal compliance
* 00:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a022f2b506089ab518b74c1dfca78924c06dc80f}}: Oversample DiscussionTools EditAttemptStep logging ([[phab:T273946|T273946]]) (duration: 01m 08s)
* 00:30 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix --add-prefix=BROKEN # [[phab:T273362|T273362]]
* 00:29 Urbanecm: mwscript namespaceDupes.php itwikiquote --fix # [[phab:T273362|T273362]]
* 00:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f051c6cdaa162ce2ea42aa53a24e50bb4aa8a793}}: Adding WQ as namespace alias for itwikiquote ([[phab:T273362|T273362]]) (duration: 01m 10s)
* 00:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|53229b0f41eb8cc3e8a90157283913c7d69810df}}: Enabling extension SandboxLink on ltwiki ([[phab:T273957|T273957]]) (duration: 01m 07s)
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 00:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 00:07 ejegg: updated fundraising civicrm from {{Gerrit|b81cb5e702}} to {{Gerrit|dfbb8f41bc}}


== 2021-02-11 ==
== 2021-07-30 ==
* 23:50 Urbanecm: Deploy security patch for [[phab:T274514|T274514]]
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:47 mutante: reimaged mwdebug2002 with buster - since this is a VM:  manually cleaned puppet cert on puppetmaster1001, signed new cert for same hostname, initial puppet run etc ([[phab:T274023|T274023]])
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:44 twentyafterfour: Train status for wmf.30 ([[phab:T271344|T271344]]) is blocked until monday. leaving wmf.30 on group1 and wmf.27 on group2 in spite of [[phab:T260401|T260401]]
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: REIMAGE
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 23:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwdebug2002.codfw.wmnet with reason: OS upgrade
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 23:20 mutante: reimaging mwdebug2002 - stretch -> buster
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 22:57 Urbanecm: Run scap pull at mwmaint1002
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 22:53 mutante: powercycling crashed mwmaint1002
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 22:53 Urbanecm: Deploy security  patch for [[phab:T274514|T274514]]
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 22:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/GlobalWatchlist: GlobalWatchlist backports (duration: 01m 11s)
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 22:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 22:03 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 22:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1332.eqiad.wmnet with reason: REIMAGE
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1331.eqiad.wmnet with reason: REIMAGE
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 21:59 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1330.eqiad.wmnet with reason: REIMAGE
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:57 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1329.eqiad.wmnet with reason: REIMAGE
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 21:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1354.eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 21:50 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1354.eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 21:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 21:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:44 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.eqiad.wmnet
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 21:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1359.eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:37 mutante: mw1355, mw1359 - power cycling
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1354.eqiad.wmnet with reason: REIMAGE
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1360.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1360.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 21:05 mutante: mw1360 - powercycling
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 21:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1364.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 20:59 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1364.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 20:52 mutante: mw1364 - powercycled
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 20:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1355.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 20:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1359.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 20:26 twentyafterfour: new train blocker preventing deploy of 1.36.0-wmf.30 to all wikis. [[phab:T274589|T274589]] blocks [[phab:T271344|T271344]]
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 20:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1365.eqiad.wmnet
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 20:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1365.eqiad.wmnet
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 20:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 20:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1360.eqiad.wmnet with reason: REIMAGE
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 20:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1361.eqiad.wmnet
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 20:09 mutante: mw1365 - powercycle - reboot issue
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 20:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1361.eqiad.wmnet
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 20:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1364.eqiad.wmnet with reason: REIMAGE
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 19:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1362.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 19:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1362.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 19:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1368.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 19:40 mutante: mw1368 - had the reboot via IPMI issue, did DRAC reset and repeated wmf-autoreimage, issue did not happen again
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 19:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1368.eqiad.wmnet
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1361.eqiad.wmnet with reason: REIMAGE
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 19:32 urbanecm@deploy1001: Synchronized wmf-config/logos.php: noop: {{Gerrit|a1244df3e829abc793113a7e32d1972db9f780a8}}: Add inline documentation to configuration about updating logos regarding labs (duration: 01m 08s)
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 19:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 19:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|93e168cb7788c772895b47f239275544fb745358}}: Added Kokebok namespace to nowikibooks ([[phab:T274265|T274265]]) (duration: 01m 20s)
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 19:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 19:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1365.eqiad.wmnet with reason: REIMAGE
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 19:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1362.eqiad.wmnet with reason: REIMAGE
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 19:20 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1363.eqiad.wmnet
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 19:13 robh@cumin1001: START - Cookbook sre.dns.netbox
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 19:13 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1363.eqiad.wmnet
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 19:13 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 19:04 mutante: mw1363 - powercycled, reboot issue
* 18:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1374.eqiad.wmnet
* 18:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1374.eqiad.wmnet
* 18:46 mutante: mw1368 - racadm racreset
* 18:46 mutante: mw1368 - reboot via IPMI issue & can't powercycle "Unable to perform requested operation." - racreet
* 18:43 mutante: mw1374 - powercycled, reboot via ipmi issue
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 17:59 bblack: lvs2007 - downtimes ended, back in service - [[phab:T274571|T274571]]
* 17:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 17:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1363.eqiad.wmnet with reason: REIMAGE
* 17:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
* 17:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1374.eqiad.wmnet with reason: REIMAGE
* 17:52 bblack: lvs2007 - starting up puppet + pybal - [[phab:T274571|T274571]]
* 17:36 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1375.eqiad.wmnet
* 17:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1375.eqiad.wmnet
* 17:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
* 17:31 bblack: lvs2007 - shutting down host - [[phab:T274571|T274571]]
* 17:27 bblack: lvs2007 - stopping pybal - [[phab:T274571|T274571]]
* 17:26 bblack: lvs2007 - puppet disabled, downtimed in icinga - [[phab:T274571|T274571]]
* 17:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
* 17:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:07 mutante: mw1375 - powercycle - stuck at reboot
* 17:03 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1376.eqiad.wmnet
* 16:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
* 16:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: cumin execution failed during wmf-reimaged
* 16:38 mutante: mw1368 - File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 637, in _execute  raise RemoteExecutionError(ret, 'Cumin execution failed')
* 16:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 16:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
* 16:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1375.eqiad.wmnet with reason: REIMAGE
* 16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1368.eqiad.wmnet with reason: REIMAGE
* 16:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1376.eqiad.wmnet with reason: REIMAGE
* 16:24 ejegg: updated payments-wiki from {{Gerrit|a232fc3438}} to {{Gerrit|4b7b195c8a}}
* 16:13 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1%, again [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14323 and previous config saved to /var/cache/conftool/dbconfig/20210211-161308-kormat.json
* 15:52 jynus: deploying fixed grants to db1163
* 15:50 gehel: ban elastic2054 from shard allocation - [[phab:T274555|T274555]]
* 15:49 jynus@cumin1001: dbctl commit (dc=all): 'Depool 1163', diff saved to https://phabricator.wikimedia.org/P14321 and previous config saved to /var/cache/conftool/dbconfig/20210211-154902-jynus.json
* 15:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
* 15:46 gehel: depooling elastic2054 - [[phab:T274555|T274555]]
* 15:45 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
* 15:45 kormat@cumin1001: dbctl commit (dc=all): 'Pool db1163 at 1% [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14320 and previous config saved to /var/cache/conftool/dbconfig/20210211-154501-kormat.json
* 15:39 gehel: powercycle elastic2054 - [[phab:T274555|T274555]]
* 15:39 gehel: powercycle elastic2054
* 14:44 kormat@cumin1001: dbctl commit (dc=all): 'Add db1163 to s1 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14318 and previous config saved to /var/cache/conftool/dbconfig/20210211-144445-kormat.json
* 14:24 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreams: Update sampling config syntax for test.instrumentation.sampled (duration: 01m 08s)
* 14:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
* 14:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
* 13:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
* 13:48 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
* 13:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
* 13:41 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
* 13:28 godog: test grafana 7.4.1 upgrade on grafana2001 - [[phab:T263747|T263747]]
* 13:27 moritzm: re-adding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall [[phab:T261130|T261130]]
* 13:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 13:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 13:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
* 13:04 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
* 13:00 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:57 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 12:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
* 12:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
* 12:41 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
* 12:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d2b1df105afd9f9c9c047ae9c0a434674f43d505}}: Changing frwiktionary wmgBabelMainCategory ([[phab:T274137|T274137]]) (duration: 01m 08s)
* 12:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
* 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:662967{{!}}wikidata: post edit constraint jobs on 50% of edits (T204031)]] (up from 40%) (duration: 01m 08s)
* 12:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:662970{{!}}wikidata: add Dagbani to wmgExtraLanguageNames (T272242)]] (duration: 01m 29s)
* 12:06 jynus: restart-failed systemd on cumin1001 after s5 eqiad snapshot failed
* 11:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
* 11:45 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:41 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 11:40 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 11:39 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1163.eqiad.wmnet with reason: REIMAGE
* 11:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
* 11:35 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 11:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
* 11:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
* 11:25 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1004.eqiad.wmnet
* 11:17 mvolz@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:13 mvolz@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 11:06 mvolz@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 11:04 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14315 and previous config saved to /var/cache/conftool/dbconfig/20210211-110447-kormat.json
* 11:03 moritzm: installing firejail security updates on Stretch
* 10:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 10:50 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 10:49 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 66%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14314 and previous config saved to /var/cache/conftool/dbconfig/20210211-104943-kormat.json
* 10:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 10:40 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 10:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 10:34 kormat@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 33%: changed binlog_format [[phab:T274472|T274472]]', diff saved to https://phabricator.wikimedia.org/P14313 and previous config saved to /var/cache/conftool/dbconfig/20210211-103440-kormat.json
* 10:33 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 10:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 10:20 kormat@cumin1001: dbctl commit (dc=all): 'db1118 depooling: change binlog_format', diff saved to https://phabricator.wikimedia.org/P14312 and previous config saved to /var/cache/conftool/dbconfig/20210211-101959-kormat.json
* 10:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format [[phab:T274472|T274472]]
* 10:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Depooling to change binglog_format [[phab:T274472|T274472]]
* 10:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 10:15 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4025.ulsfo.wmnet
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4031.ulsfo.wmnet
* 10:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1084.eqiad.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3059.esams.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1083.eqiad.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3058.esams.wmnet
* 10:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2036.codfw.wmnet
* 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5005.eqsin.wmnet
* 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5011.eqsin.wmnet
* 10:07 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 10:02 jynus: switching db1118 to row_format=STATEMENT as new s1 master candidate
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5005.eqsin.wmnet
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5011.eqsin.wmnet
* 10:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4025.ulsfo.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4031.ulsfo.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3059.esams.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3058.esams.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2036.codfw.wmnet
* 09:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
* 09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1084.eqiad.wmnet
* 09:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1083.eqiad.wmnet
* 09:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1004.eqiad.wmnet
* 09:44 akosiaris@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
* 09:43 akosiaris@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
* 09:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2001.codfw.wmnet
* 09:12 akosiaris@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
* 09:09 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host rpki2001.codfw.wmnet
* 09:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 09:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 08:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
* 08:59 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=thumbor1003.eqiad.wmnet
* 08:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 08:48 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 08:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
* 08:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 08:35 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 08:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
* 08:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3005.wikimedia.org
* 08:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast3005.wikimedia.org
* 08:11 legoktm@deploy1001: Synchronized php-1.36.0-wmf.30/vendor/wikimedia/shellbox/src/Command/BashWrapper.php: wikimedia/shellbox: Don't unconditionally allowPath( 'limit.sh' ) - [[phab:T274474|T274474]] (duration: 01m 32s)
* 08:09 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
* 08:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4003.wikimedia.org
* 08:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast4003.wikimedia.org
* 07:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5002.wikimedia.org
* 07:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host bast5002.wikimedia.org
* 07:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1021.eqiad.wmnet
* 07:44 XioNoX: push improved loopback dhcp term to all routers
* 07:39 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1021.eqiad.wmnet
* 07:25 effie: pool thumbor1001
* 07:06 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=thumbor1001.eqiad.wmnet
* 07:06 elukey: powercycle thumbor1001 - no ssh, no mgmt serial tty available, no racadm getsel infos
* 06:45 kart_: Updated cxserver to 2021-02-10-134029-production ([[phab:T274133|T274133]], [[phab:T273456|T273456]], [[phab:T271980|T271980]])
* 06:41 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:35 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:33 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 03:10 rzl@cumin1001: dbctl commit (dc=all): 'depool db1134', diff saved to https://phabricator.wikimedia.org/P14310 and previous config saved to /var/cache/conftool/dbconfig/20210211-031048-rzl.json
* 03:10 rzl: depooled db1134
* 02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job (duration: 00m 06s)
* 02:18 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f] (thin): Fix spelling error in mediacounts job
* 02:18 milimetric@deploy1001: Finished deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job (duration: 11m 06s)
* 02:07 milimetric@deploy1001: Started deploy [analytics/refinery@01d811f]: Fix spelling error in mediacounts job
* 02:05 dwisehaupt: move payments1* and frpig1* out of maintenance mode
* 02:04 eileen: process-control config revision is {{Gerrit|726db3446a}}
* 02:02 dwisehaupt: move civi1001 out of maintenance mode
* 01:54 eileen: civicrm revision changed from {{Gerrit|3776363c90}} to {{Gerrit|b81cb5e702}}, config revision is {{Gerrit|f216d8fe8e}}
* 01:35 dwisehaupt: applying new civicrm triggers to frdb1002
* 01:14 eileen: civicrm revision changed from {{Gerrit|2ce8194c07}} to {{Gerrit|3776363c90}}, config revision is {{Gerrit|f216d8fe8e}}
* 01:06 dwisehaupt: stopping mariadb replication on frdev1001 and frdb1004
* 01:05 dwisehaupt: Move payments/civi/frpig into maint mode for civi upgrade
* 01:04 eileen: process-control config revision is {{Gerrit|f216d8fe8e}}
* 00:26 legoktm@deploy1001: Synchronized wmf-config/profiler.php: Revert "profiler: Send data to excimer-buster pipeline" (duration: 02m 00s)
* 00:03 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade (duration: 00m 07s)
* 00:03 milimetric@deploy1001: Started deploy [analytics/refinery@3da19b6] (thin): More fixes for jobs after cluster upgrade


== 2021-02-10 ==
== 2021-07-29 ==
* 23:53 milimetric@deploy1001: Finished deploy [analytics/refinery@3da19b6]: More fixes for jobs after cluster upgrade (duration: 14m 23s)
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:49 legoktm@cumin1001: conftool action :
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
*
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1293.eqiad.wmnet with reason: REIMAGE
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1378.eqiad.wmnet with reason: REIMAGE
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:20 thcipriani@deploy1001: Synchronized wmf-config/ProductionServices.php: [[gerrit:661732{{!}}Remove a couple of useless DNS lookups from mediawiki-config]] [[phab:T231025|T231025]] (duration: 01m 10s)
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1294.eqiad.wmnet
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1379.eqiad.wmnet
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:16 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:15 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1370.eqiad.wmnet with reason: REIMAGE
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1294.eqiad.wmnet
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 19:04 mutante: mw1379 - racadm racreset - host did not come back from reboot and DRAC says it can't powercycle it.. while it also ALREADY ON
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 19:00 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 19:00 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1379.eqiad.wmnet
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 18:58 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 18:58 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1324.eqiad.wmnet with reason: REIMAGE
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 18:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1371.eqiad.wmnet
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 18:56 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 18:56 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1323.eqiad.wmnet with reason: REIMAGE
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 18:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 18:54 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1322.eqiad.wmnet with reason: REIMAGE
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 18:52 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1321.eqiad.wmnet with reason: REIMAGE
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 18:36 andrew@deploy1001: Finished deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update! (duration: 03m 31s)
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 18:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1379.eqiad.wmnet
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 18:33 andrew@deploy1001: Started deploy [horizon/deploy@02cb8a4]: security group dashboard policy updates, now after doing a submodule update!
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 18:32 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates (duration: 00m 07s)
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 18:32 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: security group dashboard policy updates
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 18:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1371.eqiad.wmnet
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 18:20 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1294.eqiad.wmnet with reason: REIMAGE
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 18:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thumbor1001.eqiad.wmnet
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 17:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1295.eqiad.wmnet
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 17:43 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1295.eqiad.wmnet
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 17:18 shdubsh: restart pybal on low-traffic lvs1015
* 13:11 mutante: mw2380 - rebooting
* 17:13 shdubsh: restart pybal on backup lvs1016
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 17:13 andrew@deploy1001: Finished deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates (duration: 03m 53s)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 17:09 andrew@deploy1001: Started deploy [horizon/deploy@4f5a5a7]: puppet dashboard policy updates
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 16:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
* 12:24 moritzm: added btullis to pwstore
* 16:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1295.eqiad.wmnet with reason: REIMAGE
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 16:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 16:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1379.eqiad.wmnet with reason: REIMAGE
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 16:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1371.eqiad.wmnet with reason: REIMAGE
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:20 moritzm: installing unzip security updates
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:12 moritzm: installing atftp security updates
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 16:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 16:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 15:26 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Do not produce canary events for rdf-streaming-updater streams - [[phab:T269619|T269619]] (duration: 01m 13s)
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 15:11 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.30
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 15:05 hashar: group0 wikis to 1.36.0-wmf.30  [[phab:T271344|T271344]]
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 14:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2033.codfw.wmnet
* 09:19 dcausse: restart blazegraph on wdqs1013
* 14:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4024.ulsfo.wmnet
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3057.esams.wmnet
* 09:04 mutante: decom'ing mw1267
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1082.eqiad.wmnet
* 09:02 moritzm: installing node-hosted-git-info security updates
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1081.eqiad.wmnet
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5010.eqsin.wmnet
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4030.ulsfo.wmnet
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3056.esams.wmnet
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:51 jynus: updating puppet-compiler-facts
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5004.eqsin.wmnet
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2034.codfw.wmnet
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2033.codfw.wmnet
* 08:03 moritzm: installing ipmitool security updates
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5004.eqsin.wmnet
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5010.eqsin.wmnet
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4024.ulsfo.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4030.ulsfo.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3057.esams.wmnet
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3056.esams.wmnet
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2034.codfw.wmnet
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1082.eqiad.wmnet
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 14:39 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1081.eqiad.wmnet
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 12:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T269619|T269619]]: [wdqs] Add flink sideoutput stream definitions (duration: 01m 06s)
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 12:20 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:658321{{!}}Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 2/2 (prod no-op) (duration: 01m 08s)
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 12:18 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:658321{{!}}Remove Wikibase.NewItemIdFormatter log channel (T268870)]] 1/2 (duration: 01m 07s)
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 12:11 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e8214ee812f3812f609c26d6422b85a99a91e1f6}}: Enable GrowthExperiments on bnwiki ([[phab:T266020|T266020]]) (duration: 01m 08s)
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d8cb10f246904f1af07b019da270fd8dc7816fa}}: Set wgGEHelpPanelAskMentor to true for several wikis ([[phab:T272753|T272753]]) (duration: 01m 21s)
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])
* 12:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5003.eqsin.wmnet
 
* 12:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4029.ulsfo.wmnet
== 2021-07-01 ==
* 11:56 vgutierrez: powercycle cp5003
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3055.esams.wmnet
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5009.eqsin.wmnet
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1080.eqiad.wmnet
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1079.eqiad.wmnet
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5003.eqsin.wmnet
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 11:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5009.eqsin.wmnet
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4029.ulsfo.wmnet
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 11:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3055.esams.wmnet
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 11:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1080.eqiad.wmnet
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 11:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1079.eqiad.wmnet
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 11:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4023.ulsfo.wmnet
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 11:22 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1404.eqiad.wmnet
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4023.ulsfo.wmnet
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5008.eqsin.wmnet
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14301 and previous config saved to /var/cache/conftool/dbconfig/20210210-104649-root.json
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 10:42 vgutierrez: powercycle cp5008
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4028.ulsfo.wmnet
* 14:58 papaul: poweroff mw2380 for disk replacement
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5002.eqsin.wmnet
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2030.codfw.wmnet
* 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3053.esams.wmnet
* 14:45 moritzm: installing glib2.0 security updates on buster
* 10:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2029.codfw.wmnet
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1078.eqiad.wmnet
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1077.eqiad.wmnet
* 13:03 marostegui: Deploy schema change on s2 eqiad master [[phab:T276150|T276150]]
* 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14300 and previous config saved to /var/cache/conftool/dbconfig/20210210-103146-root.json
* 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5002.eqsin.wmnet
* 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5008.eqsin.wmnet
* 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
* 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4028.ulsfo.wmnet
* 12:23 tgr: EU deploys done
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3053.esams.wmnet
* 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
* 10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
* 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
* 10:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2030.codfw.wmnet
* 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2029.codfw.wmnet
* 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1078.eqiad.wmnet
* 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
* 10:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1077.eqiad.wmnet
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400{{!}}Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
* 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14299 and previous config saved to /var/cache/conftool/dbconfig/20210210-101642-root.json
* 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
* 10:16 moritzm: installing firejail security updates
* 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 10:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:35 marostegui: Deploy schema change on s8 eqiad master [[phab:T276150|T276150]]
* 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4021.ulsfo.wmnet
* 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14298 and previous config saved to /var/cache/conftool/dbconfig/20210210-100139-root.json
* 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 100%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14297 and previous config saved to /var/cache/conftool/dbconfig/20210210-100111-root.json
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851{{!}}Avoid using MWNamespace]] (duration: 01m 06s)
* 10:00 vgutierrez: power cycling cp4021
* 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5007.eqsin.wmnet
* 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5001.eqsin.wmnet
* 10:05 moritzm: installing remaining libgcrypt20 security updates
* 09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4027.ulsfo.wmnet
* 09:56 moritzm: installing remaining gnutls28 security updates
* 09:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
* 09:55 Amir1: start of clean up of autoreview logs in ruwiki ([[phab:T285608|T285608]])
* 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
* 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
* 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14296 and previous config saved to /var/cache/conftool/dbconfig/20210210-094635-root.json
* 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2028.codfw.wmnet
* 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 75%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14295 and previous config saved to /var/cache/conftool/dbconfig/20210210-094608-root.json
* 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 09:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
* 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master [[phab:T277123|T277123]]
* 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1076.eqiad.wmnet
* 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master [[phab:T277123|T277123]]
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5001.eqsin.wmnet
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5007.eqsin.wmnet
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4021.ulsfo.wmnet
* 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
* 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4027.ulsfo.wmnet
* 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
* 09:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
* 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
* 09:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
* 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 09:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2028.codfw.wmnet
* 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master [[phab:T277123|T277123]]
* 09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2027.codfw.wmnet
* 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters [[phab:T277123|T277123]]
* 09:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1076.eqiad.wmnet
* 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters [[phab:T277123|T277123]]
* 09:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
* 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) [[phab:T277123|T277123]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14294 and previous config saved to /var/cache/conftool/dbconfig/20210210-093132-root.json
* 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) [[phab:T277123|T277123]]
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 50%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14293 and previous config saved to /var/cache/conftool/dbconfig/20210210-093104-root.json
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
* 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14292 and previous config saved to /var/cache/conftool/dbconfig/20210210-093011-root.json
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
* 09:23 vgutierrez: rolling restart of cp nodes to catch up on kernel upgrades
* 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 25%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14290 and previous config saved to /var/cache/conftool/dbconfig/20210210-091601-root.json
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 80%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14289 and previous config saved to /var/cache/conftool/dbconfig/20210210-091507-root.json
* 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 09:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1076 (re)pooling @ 10%: Slowly repooling db1076 after cloning db1162', diff saved to https://phabricator.wikimedia.org/P14288 and previous config saved to /var/cache/conftool/dbconfig/20210210-090057-root.json
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14287 and previous config saved to /var/cache/conftool/dbconfig/20210210-090004-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14286 and previous config saved to /var/cache/conftool/dbconfig/20210210-084500-root.json
* 08:41 legoktm: depooling mw1404.eqiad.wmnet for perf benchmarking ([[phab:T274041|T274041]])
* 08:41 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1404.eqiad.wmnet
* 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14285 and previous config saved to /var/cache/conftool/dbconfig/20210210-082957-root.json
* 08:19 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly repool db1127', diff saved to https://phabricator.wikimedia.org/P14284 and previous config saved to /var/cache/conftool/dbconfig/20210210-081453-root.json
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14283 and previous config saved to /var/cache/conftool/dbconfig/20210210-080512-marostegui.json
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1170:3312, db1170:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14282 and previous config saved to /var/cache/conftool/dbconfig/20210210-064330-marostegui.json
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Give more weight to db1170:3312, db1170:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14281 and previous config saved to /var/cache/conftool/dbconfig/20210210-063534-marostegui.json
* 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1170:3312, db1170:3317 with minimal weight for the first time [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14279 and previous config saved to /var/cache/conftool/dbconfig/20210210-061924-marostegui.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1170:3312 and db1170:3317 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14278 and previous config saved to /var/cache/conftool/dbconfig/20210210-061638-marostegui.json
* 06:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1020.eqiad.wmnet
* 06:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1020.eqiad.wmnet
* 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 to clone db1162 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14277 and previous config saved to /var/cache/conftool/dbconfig/20210210-055846-marostegui.json
* 03:46 ryankemper: `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service`
* 01:54 krinkle@deploy1001: Finished deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - {{Gerrit|Ib67da94fb1bdf0}} (duration: 00m 06s)
* 01:54 krinkle@deploy1001: Started deploy [integration/docroot@0234db2]: Unbreak doc.wm.o (2) - {{Gerrit|Ib67da94fb1bdf0}}
* 01:43 krinkle@deploy1001: Finished deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - {{Gerrit|Ibf28e02ec03}} (duration: 00m 06s)
* 01:43 krinkle@deploy1001: Started deploy [integration/docroot@fddc7c9]: Unbreak doc.wm.o - {{Gerrit|Ibf28e02ec03}}
* 01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade (duration: 00m 06s)
* 01:06 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6] (thin): Job fixes after Hadoop upgrade
* 01:06 milimetric@deploy1001: Finished deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade (duration: 10m 55s)
* 00:58 mutante: doc1001 - reloaded apache2
* 00:55 milimetric@deploy1001: Started deploy [analytics/refinery@b539bf6]: Job fixes after Hadoop upgrade
* 00:42 Amir1: changing frwiki to wmf.30 in mwdebug1002 to test [[phab:T264391|T264391]]
* 00:33 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.30/extensions/FeaturedFeeds: [[gerrit:662965{{!}}Fix issues with recent caching update]] ([[phab:T264391|T264391]]) (duration: 01m 10s)
* 00:22 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.30 (duration: 24m 10s)
* 00:01 twentyafterfour: train status: wmf.28 and wmf.29 are undeployed.  wmf.27 is everywhere with the exception of testwikis which is at wmf.30 refs [[phab:T271344|T271344]]


== 2021-02-09 ==
== 2021-06-30 ==
* 23:58 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.30
* 23:28 urbanecm: Evening B&C window finished
* 23:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|667d88054097b195208818aee15bb1eb58955437}}: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
* 23:55 ryankemper: Depooled `wdqs1005` - it's catching up on hours of lag
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
* 23:55 twentyafterfour@deploy1001: Finished scap: (no justification provided) (duration: 08m 43s)
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
* 23:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2250.codfw.wmnet
* 21:43 Amir1: deleting auto-review logs from test2wiki ([[phab:T285608|T285608]])
* 23:50 mutante: mw1383,mw1385 - scap pull, php
* 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 23:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1296.eqiad.wmnet
* 21:29 cstone: civicrm revision changed from {{Gerrit|789c92d13b}} to {{Gerrit|e07c2be1a7}}
* 23:47 twentyafterfour: running scap sync-world
* 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 23:47 twentyafterfour@deploy1001: Started scap: (no justification provided)
* 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
* 23:46 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.27
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 23:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1296.eqiad.wmnet
* 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
* 23:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1380.eqiad.wmnet
* 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
* 23:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1380.eqiad.wmnet
* 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
* 23:28 mutante: mw1380 - powercycling after it did not come back from normal reboot during reimaging
* 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
* 23:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1372.eqiad.wmnet
* 17:57 thcipriani: restart ci jenkins following upgrade
* 23:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1372.eqiad.wmnet
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 23:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
* 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci [[phab:T285532|T285532]]
* 23:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2250.codfw.wmnet with reason: REIMAGE
* 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # [[phab:T285866|T285866]]
* 22:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
* 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
* 22:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1296.eqiad.wmnet with reason: REIMAGE
* 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
* 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
* 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 20s)
* 22:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1372.eqiad.wmnet with reason: REIMAGE
* 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 16s)
* 22:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2259.codfw.wmnet
* 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 22:31 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2259.codfw.wmnet
* 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource ([[phab:T284389|T284389]])
* 22:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1373.eqiad.wmnet
* 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 22:28 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1373.eqiad.wmnet
* 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 14s)
* 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1298.eqiad.wmnet
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 13s)
* 22:23 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1298.eqiad.wmnet
* 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 22:23 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GlobalWatchlist extension on testwiki ([[phab:T260862|T260862]]) (duration: 02m 51s)
* 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 22:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
* 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 13s)
* 22:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
* 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 15s)
* 22:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2259.codfw.wmnet with reason: REIMAGE
* 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki ([[phab:T284885|T284885]])
* 21:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
* 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 21:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1380.eqiad.wmnet with reason: REIMAGE
* 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 21:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1373.eqiad.wmnet with reason: REIMAGE
* 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2260.codfw.wmnet
* 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 12s)
* 21:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1381.eqiad.wmnet
* 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 14s)
* 21:39 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
* 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2260.codfw.wmnet
* 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
* 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki ([[phab:T284450|T284450]])
* 21:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1381.eqiad.wmnet
* 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # [[phab:T284450|T284450]]
* 21:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
* 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 21:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1298.eqiad.wmnet with reason: REIMAGE
* 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 13s)
* 21:10 elukey: Analytics Hadoop cluster upgrade completed
* 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 21:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
* 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
* 21:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
* 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
* 21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2260.codfw.wmnet with reason: REIMAGE
* 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 21:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1381.eqiad.wmnet with reason: REIMAGE
* 13:26 moritzm: installing fluidsynth security updates on stretch
* 21:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1384.eqiad.wmnet with reason: REIMAGE
* 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 20:58 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1299.eqiad.wmnet
* 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1299.eqiad.wmnet
* 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 20:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2263.codfw.wmnet
* 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1382.eqiad.wmnet
* 13:04 mutante: switching docker-registry to nginx light variant [[phab:T164456|T164456]]
* 20:41 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1385.eqiad.wmnet
* 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1382.eqiad.wmnet
* 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1385.eqiad.wmnet
* 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 20:33 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2263.codfw.wmnet
* 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 20:21 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 20:13 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 20:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 20:12 otto@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
* 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 20:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 20:11 otto@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - otto@cumin1001
* 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 20:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 20:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
* 12:17 kart_: Updated cxserver to 2021-06-30-112813-production ([[phab:T284900|T284900]], [[phab:T284885|T284885]])
* 20:08 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 20:07 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1299.eqiad.wmnet with reason: REIMAGE
* 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 20:06 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 20:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
* 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 20:00 twentyafterfour: prepping 1.36.0-wmf.30
* 11:46 Lucas_WMDE: EU backport+config window done
* 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
* 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
* 19:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1385.eqiad.wmnet with reason: REIMAGE
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 01m 16s)
* 19:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
* 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
* 19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1382.eqiad.wmnet with reason: REIMAGE
* 11:11 moritzm: installing libgcrypt security updates on buster
* 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2263.codfw.wmnet with reason: REIMAGE
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
* 19:35 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2264.codfw.wmnet
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701504{{!}}Stop setting Wikibase client repoConceptBaseUri (T257260)]] (duration: 01m 24s)
* 19:35 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 10:44 moritzm: installing gnutls security updates on buster
* 19:27 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1383.eqiad.wmnet
* 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
* 19:26 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - [[phab:T162123|T162123]]
* 19:23 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
* 19:21 ryankemper: [[phab:T262211|T262211]] `sudo cumin 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` on `ryankemper@cumin1001`
* 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
* 19:19 ryankemper: [[phab:T262211|T262211]] Attempting to bring `relforge100[3,4]` into service; merging https://gerrit.wikimedia.org/r/661229
* 08:31 godog: remove sdf1 from thanos-be1003 in swift - [[phab:T285835|T285835]]
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
* 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
* 19:15 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2220.codfw.wmnet
* 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 19:10 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
* 19:08 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 07:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
* 19:04 elukey@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
* 05:46 ryankemper: [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw`
* 19:04 elukey@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - elukey@cumin1001
* 00:36 tstarling@deploy1002: Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s)
* 19:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 00:35 tstarling@deploy1002: Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1383.eqiad.wmnet
* 00:34 tstarling@deploy1002: Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 19:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 00:32 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s)
* 19:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2264.codfw.wmnet
* 00:31 tstarling@deploy1002: Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s)
* 18:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 00:27 tstarling@deploy1002: Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s)
* 18:57 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 00:01 urbanecm: (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of [[phab:T285819|T285819]]
* 18:46 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:45 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:42 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1010.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --task-id [[phab:T267927|T267927]]` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1010`
* 18:41 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:40 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --task-id [[phab:T267927|T267927]]` on `ryankemper@cumin1001` tmux session `wdqs_data_reload_1009`
* 18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 18:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 18:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:37 ryankemper: [[phab:T267927|T267927]] [WDQS Data Reload] Clearing old wikidata journal file to free disk space before beginning data reload:`sudo systemctl status wdqs-blazegraph && sudo systemctl stop wdqs-blazegraph && sudo rm -fv /srv/wdqs/wikidata.jnl && sudo systemctl start wdqs-blazegraph` on `wdqs100[9,10]`
* 18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1300.eqiad.wmnet
* 18:37 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
* 18:32 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:29 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:21 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
* 18:14 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 17:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 17:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 17:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
* 17:35 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
* 17:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1300.eqiad.wmnet with reason: REIMAGE
* 17:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2220.codfw.wmnet with reason: REIMAGE
* 17:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
* 17:07 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
* 17:01 gehel@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
* 16:47 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.29
* 16:21 moritzm: installing wireshark security updates
* 16:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 16:11 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 volker-e@deploy1001: Finished deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: {{Gerrit|b9b7ee6}} “Components”: Fix components overview SVG rendering glitch (#439) (duration: 00m 07s)
* 15:59 volker-e@deploy1001: Started deploy [design/style-guide@b9b7ee6]: Deploy design/style-guide: {{Gerrit|b9b7ee6}} “Components”: Fix components overview SVG rendering glitch (#439)
* 15:32 papaul: power down logstash2035 for relocation
* 15:23 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:22 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading openstack
* 15:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 95 hosts with reason: upgrading openstack
* 15:15 papaul: power down mw2220  for maintenance
* 15:11 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 11s)
* 15:10 moritzm: readding ganeti5002 to the eqsin Ganeti cluster following mainboard replacement/reinstall [[phab:T261130|T261130]]
* 15:10 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
* 15:06 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: Revert "Caching fixes" [[phab:T264391|T264391]] (duration: 01m 25s)
* 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrading                  openstack
* 14:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 10 hosts with reason: upgrading                  openstack
* 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14270 and previous config saved to /var/cache/conftool/dbconfig/20210209-145206-root.json
* 14:50 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
* 14:48 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
* 14:43 gehel: rebooting wdqs1009 / 1010 for kernel upgrade
* 14:37 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.36.0-wmf.29"
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 85%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14269 and previous config saved to /var/cache/conftool/dbconfig/20210209-143703-root.json
* 14:29 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.29 (duration: 01m 06s)
* 14:28 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.29
* 14:26 volans: cd /srv/external-monitoring; git fetch/status/pull on wikitech-static - [[phab:T273951|T273951]]
* 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14268 and previous config saved to /var/cache/conftool/dbconfig/20210209-142159-root.json
* 14:21 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.29
* 14:14 gehel: depooling wdqs1005, catching up on lag
* 14:10 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/includes/libs/objectcache/wancache/WANObjectCache.php: WANObjectCache: throw on Closure - [[phab:T273242|T273242]] (duration: 01m 08s)
* 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 60%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14267 and previous config saved to /var/cache/conftool/dbconfig/20210209-140655-root.json
* 13:52 Urbanecm: Deploy security patch ([[phab:T274152|T274152]])
* 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14266 and previous config saved to /var/cache/conftool/dbconfig/20210209-135152-root.json
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 40%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14265 and previous config saved to /var/cache/conftool/dbconfig/20210209-133648-root.json
* 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0) for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 30%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14264 and previous config saved to /var/cache/conftool/dbconfig/20210209-132145-root.json
* 13:08 twentyafterfour: restart phabricator daemons to free 3.5gb of ram (memory leak?)
* 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14263 and previous config saved to /var/cache/conftool/dbconfig/20210209-130641-root.json
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 20%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14262 and previous config saved to /var/cache/conftool/dbconfig/20210209-125138-root.json
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 15%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14261 and previous config saved to /var/cache/conftool/dbconfig/20210209-123634-root.json
* 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 13%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14260 and previous config saved to /var/cache/conftool/dbconfig/20210209-122131-root.json
* 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14259 and previous config saved to /var/cache/conftool/dbconfig/20210209-120627-root.json
* 12:05 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh for Hadoop analytics cluster: Change Hadoop distribution - elukey@cumin1001
* 12:02 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0) for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2010.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2008.codfw.wmnet
* 11:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2006.codfw.wmnet
* 11:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2005.codfw.wmnet
* 11:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
* 11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1010.eqiad.wmnet
* 11:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1008.eqiad.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1007.eqiad.wmnet
* 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=maps1006.eqiad.wmnet
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 8%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14258 and previous config saved to /var/cache/conftool/dbconfig/20210209-115124-root.json
* 11:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
* 11:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1001.eqiad.wmnet
* 11:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
* 11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14257 and previous config saved to /var/cache/conftool/dbconfig/20210209-113620-root.json
* 11:34 elukey: start the upgrade process for Hadoop Analytics
* 11:33 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster for Hadoop analytics cluster: Stop the Hadoop cluster before maintenance. - elukey@cumin1001
* 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
* 11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
* 11:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
* 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 4%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14256 and previous config saved to /var/cache/conftool/dbconfig/20210209-112116-root.json
* 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
* 11:17 vgutierrez: rolling restart of eqiad LVS instances to catch up on kernel upgrades
* 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
* 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 3%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14255 and previous config saved to /var/cache/conftool/dbconfig/20210209-110613-root.json
* 11:02 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
* 10:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 10:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database, still
* 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
* 10:53 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2001.codfw.wmnet
* 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 2%: Slowly pool db1157 into s3', diff saved to https://phabricator.wikimedia.org/P14254 and previous config saved to /var/cache/conftool/dbconfig/20210209-105109-root.json
* 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
* 10:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
* 10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
* 10:41 vgutierrez: rolling restart of esams LVS instances to catch up on kernel upgrades
* 10:40 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2001.codfw.wmnet
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 100%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14253 and previous config saved to /var/cache/conftool/dbconfig/20210209-103443-root.json
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 100%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14252 and previous config saved to /var/cache/conftool/dbconfig/20210209-103414-root.json
* 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1157 for the first time in s3 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14251 and previous config saved to /var/cache/conftool/dbconfig/20210209-102109-marostegui.json
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 75%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14250 and previous config saved to /var/cache/conftool/dbconfig/20210209-101939-root.json
* 10:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1019.eqiad.wmnet
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 75%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14249 and previous config saved to /var/cache/conftool/dbconfig/20210209-101911-root.json
* 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1157 to dbctl, depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14248 and previous config saved to /var/cache/conftool/dbconfig/20210209-101556-marostegui.json
* 10:13 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1019.eqiad.wmnet
* 10:12 gehel@cumin1001: START - Cookbook sre.wdqs.reboot
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 50%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14247 and previous config saved to /var/cache/conftool/dbconfig/20210209-100436-root.json
* 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 50%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14246 and previous config saved to /var/cache/conftool/dbconfig/20210209-100407-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 25%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14245 and previous config saved to /var/cache/conftool/dbconfig/20210209-094932-root.json
* 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 25%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14244 and previous config saved to /var/cache/conftool/dbconfig/20210209-094904-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3317 (re)pooling @ 10%: Slowly repooling db1090:3317 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14243 and previous config saved to /var/cache/conftool/dbconfig/20210209-093429-root.json
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1090:3312 (re)pooling @ 10%: Slowly repooling db1090:3312 after cloning db1170', diff saved to https://phabricator.wikimedia.org/P14242 and previous config saved to /var/cache/conftool/dbconfig/20210209-093400-root.json
* 09:22 godog: swift eqiad-prod: decrease weight for SSDs on ms-be[1019-1026] - [[phab:T272836|T272836]]
* 08:44 XioNoX: repool esams - [[phab:T272342|T272342]]
* 08:30 XioNoX: rollback redirect ns2 to authdns1001 - [[phab:T252631|T252631]]
* 08:09 XioNoX: alright, brace yourself, esams switch stack is going to go down
* 08:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 32 hosts with reason: switch upgrade
* 08:02 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on 32 hosts with reason: switch upgrade
* 07:54 XioNoX: redirect ns2 to authdns1001 - [[phab:T252631|T252631]]
* 07:47 hashar@deploy1001: Finished deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore (duration: 00m 06s)
* 07:47 hashar@deploy1001: Started deploy [integration/docroot@672e79f]: build: Add /scap/log to gitignore
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1081 from dbctl [[phab:T273040|T273040]]', diff saved to https://phabricator.wikimedia.org/P14241 and previous config saved to /var/cache/conftool/dbconfig/20210209-073455-marostegui.json
* 07:20 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14240 and previous config saved to /var/cache/conftool/dbconfig/20210209-072038-root.json
* 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14239 and previous config saved to /var/cache/conftool/dbconfig/20210209-070534-root.json
* 07:04 XioNoX: depool disable 2 uplinks on asw2-esams - [[phab:T272342|T272342]]
* 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14238 and previous config saved to /var/cache/conftool/dbconfig/20210209-065031-root.json
* 06:48 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 06:48 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 06:47 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@582b070]: 0.3.63 (duration: 06m 46s)
* 06:44 XioNoX: depool esams for network maintenance - [[phab:T272342|T272342]]
* 06:41 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.63` on canary `wdqs1003`; proceeding to rest of fleet
* 06:40 ryankemper@deploy1001: Started deploy [wdqs/wdqs@582b070]: 0.3.63
* 06:40 ryankemper: Pooled `wdqs1007` and depooled `wdqs1005` (`1005` is ~12 hours behind)
* 06:38 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.63`. Pre-deploy tests passing on canary `wdqs1003`
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14237 and previous config saved to /var/cache/conftool/dbconfig/20210209-063527-root.json
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14236 and previous config saved to /var/cache/conftool/dbconfig/20210209-062024-root.json
* 06:20 marostegui: Stop mysql on s2 and s7 on db1090 to clone db1170 [[phab:T258361|T258361]]
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14234 and previous config saved to /var/cache/conftool/dbconfig/20210209-061822-marostegui.json
* 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Slowly repooling db1111 after onsite maintenance', diff saved to https://phabricator.wikimedia.org/P14233 and previous config saved to /var/cache/conftool/dbconfig/20210209-060520-root.json
* 05:02 krinkle@deploy1001: Finished deploy [integration/docroot@fdfb265]: {{Gerrit|I271e6054880}}, [[phab:T273247|T273247]] (duration: 00m 06s)
* 05:02 krinkle@deploy1001: Started deploy [integration/docroot@fdfb265]: {{Gerrit|I271e6054880}}, [[phab:T273247|T273247]]
* 01:56 tstarling@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/FeaturedFeeds: probable fix for UBN [[phab:T273242|T273242]] (duration: 01m 06s)
* 01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1302.eqiad.wmnet
* 01:46 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1302.eqiad.wmnet
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1301.eqiad.wmnet
* 00:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1387.eqiad.wmnet
* 00:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1386.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1386.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1387.eqiad.wmnet
* 00:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE


== 2021-02-08 ==
== 2021-06-29 ==
* 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1301.eqiad.wmnet with reason: REIMAGE
* 23:45 urbanecm: Evening B&C window done
* 23:58 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1302.eqiad.wmnet with reason: REIMAGE
* 23:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|367bc98}}: {{Gerrit|904d18720}}: flood flag changes for enwikibooks ([[phab:T285594|T285594]]) (duration: 01m 07s)
* 23:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: [[phab:T273803|T273803]]
* 23:45 urbanecm: Remove TrainBranchBot from wmf-deployment Gerrit group, merges code to mediawiki-config without actually deploying it
* 23:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2220.codfw.wmnet with reason: [[phab:T273803|T273803]]
* 23:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: {{Gerrit|8a5b835050cc0f6d47b6fde2317db8743fcb9ce0}}: SpecialEditGrowthConfig: Do not use relative => true ([[phab:T285750|T285750]]) (duration: 01m 04s)
* 23:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2220.codfw.wmnet
* 23:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: {{Gerrit|c61fb175c82accda526105cc32457b07530d09fa}}: SpecialEditGrowthConfig: Do not use relative => true ([[phab:T285750|T285750]]) (duration: 01m 05s)
* 23:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2220.codfw.wmnet
* 23:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/: {{Gerrit|bad82665f8bea667aff049794612c270063c7519}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 05s)
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
* 23:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: {{Gerrit|bad82665f8bea667aff049794612c270063c7519}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 06s)
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1386.eqiad.wmnet with reason: REIMAGE
* 23:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: {{Gerrit|e77e002130a7815570ff967013757bacc7037fb0}}: Config option to enable topic subscriptions backend and dtenable=1 URL parameter ([[phab:T284491|T284491]]) (duration: 01m 09s)
* 23:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
* 21:58 maryum: deployed security patch [[phab:T285515|T285515]] to wmf.12
* 23:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1387.eqiad.wmnet with reason: REIMAGE
* 21:51 maryum: deployed security patch [[phab:T285515|T285515]] to wmf.11
* 23:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1388.eqiad.wmnet
* 21:44 maryum: deployed updated security patch for [[phab:T285190|T285190]] to wmf.12
* 23:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1303.eqiad.wmnet
* 21:42 maryum: deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 23:16 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2245.codfw.wmnet
* 21:31 sbassett: Reverted and deployed updated security patch for [[phab:T285190|T285190]] to wmf.12
* 23:14 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
* 21:29 sbassett: Reverted and deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1274.eqiad.wmnet
* 21:19 sbassett: Deployed updated security patch for [[phab:T285190|T285190]] to wmf.11
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1273.eqiad.wmnet
* 20:55 dancy: Deleted all CDB files on beta so they'll be recreated on the next scap sync-world run
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1272.eqiad.wmnet
* 20:26 dancy: Reverting to scap 3.17.1-1+0~20210419163335.8~1.gbpa6b2e0 in beta
* 22:51 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1271.eqiad.wmnet
* 19:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2245.codfw.wmnet
* 19:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
* 22:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1388.eqiad.wmnet
* 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
* 22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1303.eqiad.wmnet
* 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
* 22:45 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
* 19:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1274.eqiad.wmnet
* 19:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1273.eqiad.wmnet
* 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
* 22:29 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1271.eqiad.wmnet
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12
* 21:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
* 18:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1303.eqiad.wmnet with reason: REIMAGE
* 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
* 18:34 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc3
* 21:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2245.codfw.wmnet with reason: REIMAGE
* 18:28 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
* 21:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 18:21 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc1
* 21:34 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
* 18:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 21:29 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
* 18:07 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.7 (duration: 04m 00s)
* 21:29 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1273.eqiad.wmnet with reason: reimaging
* 17:59 urbanecm: Start server-side upload of ~2.5G of JPG files ([[phab:T282755|T282755]])
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
* 17:52 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.12 (duration: 57m 11s)
* 21:28 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1271.eqiad.wmnet with reason: reimaging
* 16:55 ryankemper: [[phab:T281327|T281327]] `[Cirrus -> codfw]` Current banned nodes are`elastic2043` and `elastic2045`; `elastic2043` can be unbanned after a re-image, and `elastic2045` can be unbanned in ~30 minutes after shards rebalance (had heavy shards scheduled)
* 21:28 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
* 16:55 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.12
* 21:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1388.eqiad.wmnet with reason: REIMAGE
* 16:45 brennen: 1.37.0-wmf.12 was branched at {{Gerrit|3703c3194b590a1fcccb485245022eac369d2b69}} for [[phab:T281153|T281153]]
* 21:25 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
* 16:28 ebernhardson: temporarily ban elastic2045 from production-search-codfw
* 21:25 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1274.eqiad.wmnet with reason: REIMAGE
* 15:43 dcausse: unbanning elastic2054
* 21:24 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1304.eqiad.wmnet
* 15:30 dcausse: restarting blazegraph on wdqs1012
* 21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
* 15:17 effie: pool mw2383 back
* 21:24 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
* 15:15 mutante: [mwlog2002:~] $ sudo systemctl start mw-log-cleanup
* 21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1271.eqiad.wmnet with reason: REIMAGE
* 15:06 dcausse: banning elastic2054
* 21:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1273.eqiad.wmnet with reason: REIMAGE
* 14:53 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[1-2].codfw.wmnet,service=canary
* 21:23 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1272.eqiad.wmnet with reason: REIMAGE
* 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[8-9].codfw.wmnet,service=canary
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1305.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw225[1-2].codfw.wmnet,service=canary
* 21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1304.eqiad.wmnet
* 14:52 effie: depool mw2383 as it is misbehaving
* 21:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1305.eqiad.wmnet
* 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 21:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1389.eqiad.wmnet
* 14:47 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 21:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1389.eqiad.wmnet
* 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw226[1-2].codfw.wmnet
* 21:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1390.eqiad.wmnet
* 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2290.codfw.wmnet
* 21:07 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1390.eqiad.wmnet
* 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 20:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
* 14:46 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw22[7-8][0-9].codfw.wmnet
* 20:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1304.eqiad.wmnet with reason: REIMAGE
* 14:45 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
* 14:44 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1305.eqiad.wmnet with reason: REIMAGE
* 14:44 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet,service=api_appserver
* 20:20 jgiannelos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 20:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
* 14:38 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 20:17 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Undo migration of SpecialMuteSubmit on all wikis except testwiki - [[phab:T268517|T268517]] (duration: 01m 06s)
* 14:38 _joe_: restarting pohp-fpm on mw2383
* 20:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1389.eqiad.wmnet with reason: REIMAGE
* 14:38 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 20:16 jgiannelos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:37 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2103 (s1) weight a bit', diff saved to https://phabricator.wikimedia.org/P16739 and previous config saved to /var/cache/conftool/dbconfig/20210629-143742-marostegui.json
* 20:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1390.eqiad.wmnet with reason: REIMAGE
* 14:37 _joe_: repooling mw2383
* 20:11 jgiannelos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 14:36 _joe_: depooling mw2383
* 19:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1391.eqiad.wmnet
* 14:30 legoktm@deploy1002: Synchronized wmf-config/db-codfw.php: fix trwikivoyage (duration: 01m 01s)
* 19:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
* 14:29 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 19:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
* 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 19:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
* 14:28 Krinkle: TODO: Don't duplicate `sectionsByDB` between db-* files
* 19:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1391.eqiad.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 19:48 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success (duration: 01m 19s)
* 14:23 jayme@cumin1001: MediaWiki read-only period ends at: 2021-06-29 14:23:23.504447
* 19:47 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ca9bba1]: cirrus_namespace_map: only overwrite on success
* 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 19:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 19:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
* 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 19:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 19:20 urbanecm@deploy1001: Synchronized wmf-config/config/dawiki.yaml: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]; 3/3) (duration: 01m 04s)
* 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 19:18 urbanecm@deploy1001: Synchronized dblists/growthexperiments.dblist: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]; 2/3) (duration: 01m 03s)
* 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 19:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f39eefaa4c0dabfbc5b03fdc1b12e48913089bd}}: Enable GrowthExperiments at dawiki ([[phab:T256126|T256126]]) (duration: 01m 05s)
* 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 19:13 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 19:11 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
* 14:21 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 19:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1391.eqiad.wmnet with reason: REIMAGE
* 14:21 jayme@cumin1001: MediaWiki read-only period starts at: 2021-06-29 14:21:26.671853
* 19:08 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 19:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e94e2177b7f31bea1c6bc21b272a4529a38b4b3}}: Make DiscussionTools newtopictool available on testwiki (duration: 01m 07s)
* 14:15 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 18:52 mutante: mw1391 - reimaging
* 14:15 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 18:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2037.codfw.wmnet
* 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 44 hosts with reason: DC switchover
* 18:29 ppchelko@deploy1001: Finished deploy [restbase/deploy@a458845]: Add trwikivoyage [[phab:T271262|T271262]] and restore restbase2009 (duration: 17m 13s)
* 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 44 hosts with reason: DC switchover
* 18:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2037.codfw.wmnet
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2036.codfw.wmnet
* 14:11 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2036.codfw.wmnet
* 14:10 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:21 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2035.codfw.wmnet
* 14:09 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:16 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2035.codfw.wmnet
* 14:08 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 18:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2034.codfw.wmnet
* 14:02 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 18:12 ppchelko@deploy1001: Started deploy [restbase/deploy@a458845]: Add trwikivoyage [[phab:T271262|T271262]] and restore restbase2009
* 14:01 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 18:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2034.codfw.wmnet
* 14:01 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 18:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2033.codfw.wmnet
* 13:51 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@edc31a2]
* 18:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2033.codfw.wmnet
* 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2] (duration: 00m 07s)
* 17:57 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 13:49 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2]
* 17:57 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database
* 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 17m 42s)
* 17:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2032.codfw.wmnet
* 13:35 volker-e@deploy1002: Finished deploy [design/style-guide@e97fccb]: Deploy design/style-guide: {{Gerrit|e97fccb}} styles: Add internationalization and accessibility note labels and treatments (#476) (duration: 00m 07s)
* 17:38 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2032.codfw.wmnet
* 13:34 volker-e@deploy1002: Started deploy [design/style-guide@e97fccb]: Deploy design/style-guide: {{Gerrit|e97fccb}} styles: Add internationalization and accessibility note labels and treatments (#476)
* 17:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2031.codfw.wmnet
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
* 17:31 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2031.codfw.wmnet
* 11:54 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:702095{{!}}vector: Finish enabling language switcher treatment A/B test on fawiki (T269093)]] (duration: 00m 56s)
* 17:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2030.codfw.wmnet
* 11:38 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]], Part II (duration: 00m 58s)
* 17:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2030.codfw.wmnet
* 11:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/includes/Rdf/PropertyStubRdfBuilder.php: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]], Part I (duration: 00m 56s)
* 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2029.codfw.wmnet
* 11:35 ladsgroup@deploy1002: sync-file aborted: Backport: [[gerrit:702018{{!}}Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634)]] (duration: 00m 10s)
* 17:23 otto@deploy1001: Synchronized wmf-config/CommonSettings.php: CommonSettings - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 01m 06s)
* 10:30 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on acmechief* after switch towards nginx-light [[phab:T164456|T164456]]
* 17:21 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: ProductionServices - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 01m 06s)
* 09:27 moritzm: installing nettle security updates on buster
* 17:20 otto@deploy1001: sync-file aborted: ProductionServices - Add eventgate-analytics-external - [[phab:T272863|T272863]] (no-op) (duration: 00m 02s)
* 08:47 elukey: repool mw13[55,84] after debugging - [[phab:T285634|T285634]]
* 17:20 otto@deploy1001: Synchronized wmf-config/LabsServices.php: LabsServices - Add eventgate-analytics-external - [[phab:T272998|T272998]] (duration: 01m 08s)
* 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
* 17:19 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2029.codfw.wmnet
* 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
* 17:19 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2027.codfw.wmnet
* 08:43 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 17:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2027.codfw.wmnet
* 08:25 elukey: cumin 'A:mw-eqiad' '/usr/local/sbin/restart-php7.2-fpm' -b 2 -s 30 - [[phab:T285634|T285634]]
* 17:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2026.codfw.wmnet
* 08:21 elukey: depool mw1355 (mw appserver) for debugging - [[phab:T285634|T285634]]
* 17:06 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2026.codfw.wmnet
* 08:21 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
* 16:30 XioNoX: adding option-82 to all prod vlans DHCP - [[phab:T269855|T269855]]
* 08:12 hashar: Upgrading Jenkins on contint2001 / contint1001 and restarting CI Jenkins # [[phab:T285531|T285531]]
* 16:02 Urbanecm: Deploy security patch ([[phab:T71367|T71367]])
* 08:03 hashar: Upgraded Jenkins on releases1002 / releases2002 # [[phab:T285531|T285531]]
* 15:49 gehel: repool wdqs1012 - catched up on lag
* 08:02 hashar: Upgraded Jenkins on releases1002 / releases2002
* 15:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:50 godog: remove 20G migration data /root/prometheus from prometheus4001 - [[phab:T243057|T243057]]
* 15:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 07:48 godog: remove old /root/prometheus data from prometheus4001
* 15:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
* 07:05 moritzm: upgrading bullseye early installs to the latest state of testing [[phab:T275873|T275873]]
* 15:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 06:46 tstarling@deploy1002: Synchronized php-1.37.0-wmf.11/includes/MediaWiki.php: Add statsd action timing metric [[phab:T284274|T284274]] (duration: 00m 58s)
* 15:37 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on maps1001.eqiad.wmnet with reason: Server being relocated
* 02:47 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload and A:codfw' 'run-puppet-agent -q'
* 15:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
* 02:34 ryankemper: [[phab:T285643|T285643]] Banned `elastic1039` from all 3 elasticsearch clusters and set `elastic1039.eqiad.wmnet` to failed in netbox
* 15:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:27 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload' 'run-puppet-agent -q'
* 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2001.codfw.wmnet
* 02:25 eileen: civicrm revision changed from {{Gerrit|927ab7cff7}} to {{Gerrit|789c92d13b}}, config revision is {{Gerrit|1739c53fcb}}
* 02:04 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0e916b1]: 0.3.75 (duration: 08m 40s)
* 01:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.75` on canary `wdqs1003`; proceeding to rest of fleet
* 01:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0e916b1]: 0.3.75
* 01:50 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.75`. Pre-deploy tests passing on canary `wdqs1003`
* 00:25 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref [[phab:T282761|T282761]]
 
== 2021-06-28 ==
* 23:07 urbanecm: Evening B&C window done
* 23:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5ec855d14b31a9392274c2bfe2e21e2ad44986bc}}: Enable Parsoid inspired media structure on test wikis ([[phab:T51097|T51097]]) (duration: 00m 59s)
* 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 22:51 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 22:50 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 22:48 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 22:48 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
* 22:44 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 22:43 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-06-28 22:43:04.512602
* 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
* 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 22:41 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 22:41 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-28 22:41:41.222740
* 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 22:40 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 22:38 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 22:38 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 22:32 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 22:31 legoktm: starting DC switchover live test, which will "switch" us from codfw -> eqiad
* 22:28 eileen: civicrm revision changed from {{Gerrit|9d1203fb28}} to {{Gerrit|927ab7cff7}}, config revision is {{Gerrit|1739c53fcb}}
* 22:09 legoktm: live-hacked spicerack on cumin1001 to ignore x2, see https://phabricator.wikimedia.org/T285519#7182377
* 21:55 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc2, ref [[phab:T282761|T282761]]
* 20:03 cstone: payments-wiki revision is {{Gerrit|d9892207c1}}
* 19:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/maintenance/: {{Gerrit|I618bc1e8ca3008}} (duration: 00m 56s)
* 19:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/libs/objectcache/: [[phab:T282761|T282761]] - {{Gerrit|I618bc1e8ca3008}} (duration: 00m 56s)
* 19:45 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/objectcache/SqlBagOStuff.php: [[phab:T282761|T282761]] - {{Gerrit|I618bc1e8ca3008}} (duration: 00m 59s)
* 18:40 ebernhardson@deploy1002: Synchronized wmf-config/: [[phab:T281515|T281515]]: Prepare Cirrus more_like for dc switchover (duration: 01m 02s)
* 18:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: {{Gerrit|ecf1d6c47c6cc30d84161e023373c6a2c7287be8}}: Make it possible to force opt-in/opt-out to Growth features during account creation ([[phab:T284119|T284119]]; [[phab:T284800|T284800]]; 3/3) (duration: 00m 55s)
* 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HelpPanelHooks.php: {{Gerrit|ecf1d6c47c6cc30d84161e023373c6a2c7287be8}}: Make it possible to force opt-in/opt-out to Growth features during account creation ([[phab:T284119|T284119]]; [[phab:T284800|T284800]]; 2/3) (duration: 00m 55s)
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: {{Gerrit|ecf1d6c47c6cc30d84161e023373c6a2c7287be8}}: Make it possible to force opt-in/opt-out to Growth features during account creation ([[phab:T284119|T284119]]; [[phab:T284800|T284800]]; 1/3) (duration: 00m 58s)
* 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor/: {{Gerrit|794a46c861dbf5ac05ec824d7591e507c1eefd16}}: Hotfix for broken "Extract show all to placeholder class" ([[phab:T284636|T284636]]; [[phab:T285571|T285571]]) (duration: 00m 57s)
* 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4ae0fdd3ad19105fc36bd1eb7102dea9c4a5178d}}: Enable DiscussionTools topicsubscription as beta feature on partner wikis ([[phab:T274280|T274280]]) (duration: 00m 57s)
* 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5b59184804b2ee0697ad20102eca0646aec4b105}}: Remove redundant wgDiscussionToolsEnable overrides (duration: 00m 56s)
* 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1043c931b4b41d13410e91336f87b122b5447959}}: Growth: Enable community configuration at all Growth wikis ([[phab:T285423|T285423]]) (duration: 00m 56s)
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:44 sukhe: Traffic: depool eqiad from user traffic
* 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:30 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-.*,name=eqiad
* 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host krb2001.codfw.wmnet
* 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
* 15:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1002.wikimedia.org
* 15:08 jayme@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
* 15:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 gehel: restarting wdqs-updater on all wdqs hosts for new configuration
* 14:54 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
* 14:53 jayme@cumin1001: Switching services swift, proton, mathoid, restbase, swift-ro, eventstreams, search, shellbox, eventgate-analytics-external, wdqs-internal, kartotherian, api-gateway, termbox, mobileapps, similar-users, wikifeeds, apertium, restbase-async, eventgate-main, eventgate-logging-external, ores, sessionstore, linkrecommendation, echostore, push-notifications, citoid, zotero, eventgate-analytics, wdqs, eventstreams-i
* 14:53 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
* 14:37 jayme@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=99)
* 14:36 jayme@cumin1001: Switching services kartotherian, proton, wdqs-internal, wikifeeds, zotero, recommendation-api, swift-ro, linkrecommendation, mobileapps, citoid, eventgate-analytics, push-notifications, eventstreams-internal, mathoid, similar-users, schema, apertium, restbase-async, shellbox, termbox, wdqs, ores, eventgate-analytics-external, swift, helm-charts, restbase, cxserver, search, sessionstore, eventstreams, api-gate
* 14:36 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
* 14:35 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
* 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
* 14:21 effie: restarted mw[1322,1329,1333,1350,1351,1352,1353,1354,1366,1367,1368,1370,1372,1373]
* 14:07 effie: restarting busy php-fpm app servers
* 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701503{{!}}Remove $wmgWikibaseRepoForeignRepositories (T257260)]] (2/2, beta) (duration: 00m 57s)
* 13:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701503{{!}}Remove $wmgWikibaseRepoForeignRepositories (T257260)]] (1/2, prod) (duration: 00m 57s)
* 12:59 moritzm: installing intel-microcode security updates on buster
* 12:30 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/MediaHandler.php: Backport: [[gerrit:701718{{!}}media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully (T285490)]] (duration: 00m 56s)
* 12:24 dcausse: repool wdqs1012
* 12:00 Lucas_WMDE: EU backport+config window done
* 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701502{{!}}Stop setting Wikibase repo foreignRepositories (T257260)]] (duration: 00m 55s)
* 11:40 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-eqiad
* 11:38 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-codfw
* 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e4a088fbcf90a8c232c616ba5df9ad01cb5449e8}}: vector: Enable language switcher treatment A/B test on fawiki ([[phab:T269093|T269093]]) (duration: 00m 55s)
* 11:28 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/signup/campaign.less: {{Gerrit|cd16aa2b51fb74e628c4ad26ac6b469bc04ab370}}: Donor campaign: fix signup page styling ([[phab:T284740|T284740]]) (duration: 00m 56s)
* 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9495d18521c3639cbd783a6341fe5348e31b0103}}: GrowthExperiments: Update campaign pattern ([[phab:T284800|T284800]]) (duration: 00m 56s)
* 11:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in [[phab:T157030|T157030]] and similar tasks
* 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
* 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
* 11:18 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ scap pull # did not print any errors
* 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ade641b39bae8f2abb5d318299b033bfd8a7cb7a}}: Deploy ContentTranslation out of Beta feature in 9 WPs ([[phab:T284641|T284641]]) (duration: 00m 56s)
* 10:44 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:701884{{!}} Bumping portals to master (T128546)]] (duration: 00m 56s)
* 10:43 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:701884{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:25 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
* 10:23 mutante: sodium - restarted nginx
* 10:23 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
* 10:22 mutante: sodium (mirrors.wikimedia.org) - switching to nginx light variant [[phab:T164456|T164456]]
* 10:11 vgutierrez: rolling upgrade of ATS on eqiad - [[phab:T285535|T285535]]
* 10:11 moritzm: installing remaining libxml2 security updates
* 09:52 vgutierrez: rolling upgrade of ATS on esams - [[phab:T285535|T285535]]
* 09:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701501{{!}}Remove $wmgWikibaseClientChangesDatabase (T257260)]] (2/2, beta) (duration: 00m 56s)
* 09:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config


== 2021-02-07 ==
== 2021-06-03 ==
* 22:58 Urbanecm: Reset password for TheresNoTime ([[phab:T274087|T274087]])
* 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] (duration: 00m 56s)
* 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [[phab:T280886|T280886]] (duration: 00m 57s)
* 23:33 mutante: installing OS on fresh VM doh5001
* 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
* 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:694686{{!}}Restrict changetags to sysops and bots on meta]] [[phab:T283625|T283625]] (duration: 00m 58s)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper: [[phab:T280382|T280382]] Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
* 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 22:35 ryankemper: [[phab:T280382|T280382]] `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
* 20:47 sbassett: Deployed security patch for [[phab:T282932|T282932]]
* 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
* 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
* 20:34 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
* 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start  daily_account_consistency_check.service
* 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
* 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
* 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
* 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
* 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
* 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - [[phab:T251918|T251918]] -  icinga-wm> RECOVERY - Check systemd state on deneb is OK
* 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
* 19:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 19:27 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
* 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers [[phab:T164456|T164456]]
* 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
* 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
* 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
* 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 18:46 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
* 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
* 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
* 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant ([[phab:T164456|T164456]])
* 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
* 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
* 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
* 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
* 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
* 17:16 urandom: remove dropped Cassandra keyspace snapshots -- [[phab:T258414|T258414]]
* 16:55 ejegg: updated payments-wiki from {{Gerrit|6fac77f60e}} to {{Gerrit|7be0534b91}}
* 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
* 15:27 papaul: pdu  replacement  complete
* 15:25 moritzm: upgrading gitlab to 13.11.5
* 15:08 papaul: disconnect ps2-d8-codfw for replacement
* 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
* 14:23 moritzm: installing nginx security updates on buster
* 14:12 moritzm: installing postgresql-9.6 security updates
* 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
* 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
* 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
* 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
* 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
* 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
* 12:03 moritzm: installing lz4 security updates on buster
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
* 11:53 moritzm: installing curl security updates on stretch
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
* 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
* 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
* 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
* 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
* 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e84096857c8a2f753e077aa6c3e37b910b9e1fcd}}: jawiki: extended confirmed should be 120 days since first edit, not registration ([[phab:T284212|T284212]]) (duration: 00m 58s)
* 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
* 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
* 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
* 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:41 godog: test librenms/AM paging
* 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
* 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache [[phab:T282761|T282761]]
* 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
* 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary [[phab:T282761|T282761]] (duration: 00m 58s)
* 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - [[phab:T282373|T282373]] [[phab:T282372|T282372]] [[phab:T282371|T282371]]
* 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
* 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) [[phab:T235162|T235162]]
* 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
* 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
* 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
* 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:23 ryankemper: [[phab:T280382|T280382]] `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
* 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
* 04:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:36 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:30 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 04:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
* 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
* 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:07 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
* 02:05 ryankemper: [[phab:T280382|T280382]] `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:51 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 01:47 ryankemper: [[phab:T280382|T280382]] `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
* 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: [[gerrit:697816{{!}}Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650)]], Try II (duration: 00m 57s)
* 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: [[gerrit:697818{{!}}user: Accept options-messages for multiselect user options (T58633 T278650)]] (duration: 00m 57s)
* 00:35 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)


== 2021-02-06 ==
== 2021-06-02 ==
* 08:59 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 23:57 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 08:58 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh-clients (exit_code=0) for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 23:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 08:52 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh-clients for Hadoop test cluster: Change Hadoop distribution - elukey@cumin1001
* 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 03:40 ryankemper: Deleted dump taking up diskspace on `wdqs1009`, disk space warning will resolve now
* 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:30 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1319.eqiad.wmnet
* 23:47 ryankemper: [[phab:T280382|T280382]] `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2        2.9T  998G  1.8T  36% /srv`
* 01:29 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
* 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1319.eqiad.wmnet
* 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:25 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1313.eqiad.wmnet
* 23:28 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 01:00 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2265.codfw.wmnet
* 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 00:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1366.eqiad.wmnet
* 23:26 ryankemper: [[phab:T280382|T280382]] `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`
* 00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1366.eqiad.wmnet
* 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 00:46 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2265.codfw.wmnet
* 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: [[gerrit:697817{{!}}Allow html form field option 'options-messages' to get parsed (T58633)]] (duration: 01m 01s)
* 00:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
* 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 00:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1313.eqiad.wmnet with reason: REIMAGE
* 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 00:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
* 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697855{{!}}Enable wgVectorConsolidateUserLinks on the beta cluster (T266536)]] (duration: 00m 57s)
* 00:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1319.eqiad.wmnet with reason: REIMAGE
* 22:39 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
* 00:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
* 22:34 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P<nowiki>{</nowiki>apt*<nowiki>}</nowiki>' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 00:17 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2265.codfw.wmnet with reason: REIMAGE
* 22:30 ryankemper: [[phab:T280382|T280382]] Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P<nowiki>{</nowiki>install*<nowiki>}</nowiki>' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
* 00:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE
* 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 00:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1366.eqiad.wmnet with reason: REIMAGE
* 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
* 22:19 Amir1: setting charset of all tables in wikitech to binary ([[phab:T284108|T284108]] [[phab:T269348|T269348]])
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
* 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
* 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
* 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:59 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
* 21:56 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
* 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
* 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
* 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool`  (catching up on 17.9h lag)
* 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 21:10 ryankemper: [[phab:T280382|T280382]] [[phab:T281437|T281437]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
* 21:10 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
* 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
* 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
* 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
* 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
* 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
* 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
* 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e9c981d5173b1d611458f6c70b34d73476b7bbde}}: Revert "enwiktionary: Raise AF emergency disable treshold+count" ([[phab:T283460|T283460]]) (duration: 00m 58s)
* 18:11 urbanecm: Deployed security patch for [[phab:T281972|T281972]]
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|4bf76fc09bc06f76ce842d42b77fe6b036943b69}}: Make DiscussionTools replytool available for everyone on wikitech ([[phab:T283119|T283119]]) (duration: 00m 58s)
* 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
* 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
* 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
* 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
* 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 12:05 jbond: enable puppet fleet wide.  post changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 11:54 jbond: disable puppet fleet wide.  changing puppetdb to use nginx-light #[[phab:T164456|T164456]]
* 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: {{Gerrit|85feaa15d9bbda130541adb6302f31c4372e6519}}: InfoAction: Cast wgNamespaceProtection to array ([[phab:T283751|T283751]]) (duration: 01m 00s)
* 11:08 jbond: update mod_auth_cas [[phab:T264605|T264605]]
* 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f12e368481b6836eefa070ad5dcf52af3f39d479}}: Investigate MediaSearch usability on other wikis ([[phab:T278984|T278984]]) (duration: 00m 57s)
* 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #[[phab:T264605|T264605]]
* 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #[[phab:T264605|T264605]]
* 10:44 topranks: Commit pfw policy {{Gerrit|1622570851}} to pfw3-codfw and pfw3-eqiad to support new host fran2001 ([[phab:T282056|T282056]])
* 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
* 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
* 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280396]] ([[:phab:T284118{{!}}request]])' 'OTRS' 'VRT' 'Quiddity (WMF)' # [[phab:T284118|T284118]]
* 08:12 moritzm: removed eight inactive addresses from ops@ list
* 07:44 moritzm: installing squid security updates
* 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
* 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
* 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
* 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
* 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
* 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:697671{{!}}Fix pageterms API call for Special:Nearby in Wikidata (T281639)]] (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
* 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
* 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
* 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
* 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
* 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
* 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
* off: restart tcpircbot-logmsgbot on alert1001 - [[phab:T284123|T284123]]
* 04:56 marostegui: Test


== 2021-02-05 ==
== 2021-06-01 ==
* 23:37 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
* 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per [[phab:T284108|T284108]]
* 23:35 ryankemper: [[phab:T267927|T267927]] Re-downloading latest dumps (main database, lexeme) in tmux session `downloads_dumps` on `ryankemper@wdqs1009.eqiad.wmnet`
* 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists ([[phab:T282303|T282303]])
* 23:15 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1285.eqiad.wmnet
* 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
* 22:56 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:38 legoktm: stopped mailman2 service on lists1001 ([[phab:T52864|T52864]])
* 22:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 22:50 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 15:16 ryankemper: [[phab:T283223|T283223]] `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id [[phab:T283223|T283223]]` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
* 22:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - [[phab:T283223|T283223]]
* 22:46 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 14:59 topranks: Restoring Lumen CCT {{Gerrit|442550293}} to normal metric / bring back into service ([[phab:T274234|T274234]])
* 22:46 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:56 marostegui: Stop mysql on db2079 (codfw master) -  [[phab:T283743|T283743]]
* 22:42 ryankemper: [[phab:T267927|T267927]] `sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason '[[phab:T267927|T267927]]: Reload wikidata jnl from fresh dumps' --task-id [[phab:T267927|T267927]]` failing with `ERROR org.wikidata.query.rdf.tool.Munge - Fatal error munging RDF`
* 13:53 topranks: Draining Lumen CCT {{Gerrit|442550293}} to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 22:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3f757748a14ac8c205f6a5fac0611216c01ceb1c}}: cawiki: Fix help panel links ([[phab:T280673|T280673]]) (duration: 00m 58s)
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]] (duration: 02m 58s)
* 22:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
* 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - [[phab:T272973|T272973]]
* 22:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service ([[phab:T274234|T274234]])
* 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1269.eqiad.wmnet
* 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 22:32 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1269.eqiad.wmnet
* 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
* 22:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1306.eqiad.wmnet
* 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
* 22:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1306.eqiad.wmnet
* 12:06 moritzm: installing djvulibre security updates
* 22:03 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1285.eqiad.wmnet with reason: REIMAGE
* 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 22:01 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1285.eqiad.wmnet with reason: REIMAGE
* 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
* 21:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1393.eqiad.wmnet
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e4989d2b19e07d2a816cd7f6afae077f86aca54e}}: Enable "Diff" RSS feed on meta ([[phab:T283380|T283380]]) (duration: 00m 58s)
* 21:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1393.eqiad.wmnet
* 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1392.eqiad.wmnet
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 21:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1392.eqiad.wmnet
* 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
* 21:48 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2266.codfw.wmnet
* 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1269.eqiad.wmnet with reason: REIMAGE
* 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw ([[phab:T274234|T274234]])
* 21:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1269.eqiad.wmnet with reason: REIMAGE
* 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 21:31 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
* 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # [[phab:T268225|T268225]]
* 21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1306.eqiad.wmnet with reason: REIMAGE
* 07:26 dcausse: depooling wdsq1005 (lag)
* 21:29 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2266.codfw.wmnet
* 07:14 moritzm: installing nginx security updates
* 21:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1306.eqiad.wmnet with reason: REIMAGE
* 05:56 legoktm: restarting mailman3 on lists1001
* 21:09 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2254.codfw.wmnet
* 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
* 21:05 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2254.codfw.wmnet
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
* 21:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2266.codfw.wmnet with reason: REIMAGE
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
* 21:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2266.codfw.wmnet with reason: REIMAGE
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
* 20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
* 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
* 20:57 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
* 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1392.eqiad.wmnet with reason: REIMAGE
* 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 07s)
* 20:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
* 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" ([[phab:T280908|T280908]]) (duration: 01m 00s)
* 20:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
* 20:55 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1392.eqiad.wmnet with reason: REIMAGE
* 20:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1393.eqiad.wmnet with reason: REIMAGE
* 20:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1393.eqiad.wmnet with reason: REIMAGE
* 20:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2289.codfw.wmnet
* 20:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1394.eqiad.wmnet
* 20:32 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1395.eqiad.wmnet
* 20:30 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2289.codfw.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1394.eqiad.wmnet
* 20:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1395.eqiad.wmnet
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2254.codfw.wmnet with reason: REIMAGE
* 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2254.codfw.wmnet with reason: REIMAGE
* 20:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 20:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2289.codfw.wmnet with reason: REIMAGE
* 19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2289.codfw.wmnet with reason: REIMAGE
* 19:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1394.eqiad.wmnet with reason: REIMAGE
* 19:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1395.eqiad.wmnet with reason: REIMAGE
* 19:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1394.eqiad.wmnet with reason: REIMAGE
* 19:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1395.eqiad.wmnet with reason: REIMAGE
* 19:39 mutante: reimaging 2 scap proxies in codfw because there are no deployments today
* 15:32 cmjohnson1: replacing optics and fiber on pfw3a-eqiad:xe-0/0/17 and fasw-c1a-eqiad:xe-0/2/0 [[phab:T271295|T271295]]
* 15:28 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@6b74e78]: (no justification provided) (duration: 00m 26s)
* 15:28 oblivian@deploy1001: Started deploy [docker-pkg/deploy@6b74e78]: (no justification provided)
* 14:45 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 01m 26s)
* 14:44 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1001.eqiad.wmnet
* 13:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow1001.eqiad.wmnet
* 13:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2001.codfw.wmnet
* 13:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow2001.codfw.wmnet
* 13:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3001.esams.wmnet
* 13:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow3001.esams.wmnet
* 13:02 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4001.ulsfo.wmnet
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow4001.ulsfo.wmnet
* 12:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5001.eqsin.wmnet
* 12:57 moritzm: reset ifup on netflow5001 [[phab:T273026|T273026]]
* 12:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host netflow5001.eqsin.wmnet
* 12:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
* 12:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
* 12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
* 12:42 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
* 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases1002.eqiad.wmnet
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host releases1002.eqiad.wmnet
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host releases2002.codfw.wmnet
* 12:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host releases2002.codfw.wmnet
* 12:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
* 12:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
* 12:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parse2001.codfw.wmnet
* 12:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1002.eqiad.wmnet
* 12:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
* 12:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host people1002.eqiad.wmnet
* 12:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
* 12:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host parse2001.codfw.wmnet
* 12:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
* 12:00 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 01m 00s)
* 11:59 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
* 11:59 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 04m 04s)
* 11:55 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
* 11:55 oblivian@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 03m 25s)
* 11:51 oblivian@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
* 11:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
* 11:44 jayme@deploy1001: Finished deploy [docker-pkg/deploy@7257244]: (no justification provided) (duration: 05m 50s)
* 11:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
* 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 11:39 ayounsi@cumin1001: START - Cookbook sre.network.cf
* 11:39 jayme@deploy1001: Started deploy [docker-pkg/deploy@7257244]: (no justification provided)
* 11:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
* 11:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
* 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
* 11:29 vgutierrez: restart acme-chief instances to catch up on kernel upgrades
* 11:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3001.esams.wmnet
* 11:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3001.esams.wmnet
* 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir3002.esams.wmnet
* 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir3002.esams.wmnet
* 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1001.eqiad.wmnet
* 11:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1001.eqiad.wmnet
* 11:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir1002.eqiad.wmnet
* 10:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir1002.eqiad.wmnet
* 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 100%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14222 and previous config saved to /var/cache/conftool/dbconfig/20210205-105345-root.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 75%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14221 and previous config saved to /var/cache/conftool/dbconfig/20210205-103841-root.json
* 10:32 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 10:27 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 50%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14220 and previous config saved to /var/cache/conftool/dbconfig/20210205-102338-root.json
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 25%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14219 and previous config saved to /var/cache/conftool/dbconfig/20210205-100834-root.json
* 10:06 gehel: repooling wdqs1013 - catched up on lag
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 10%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14218 and previous config saved to /var/cache/conftool/dbconfig/20210205-095331-root.json
* 09:45 dcausse: reloading categories from scratch on wdqs1010
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1075 (re)pooling @ 5%: Slowly pooling db1075 after cloning db1157', diff saved to https://phabricator.wikimedia.org/P14217 and previous config saved to /var/cache/conftool/dbconfig/20210205-093827-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094 [[phab:T273710|T273710]]', diff saved to https://phabricator.wikimedia.org/P14214 and previous config saved to /var/cache/conftool/dbconfig/20210205-084625-marostegui.json
* 08:29 dcausse: reloading categories from scratch on wdqs1009
* 07:55 gehel: cleanup of left over ttl dumps on wdqs1009 and wdqs1010
* 07:47 gehel: depooling wdqs1013 and restarting blazegraph
* 07:28 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 07:28 oblivian@cumin1001: START - Cookbook sre.network.cf
* 06:36 marostegui: Stop MySQL on db1075 to clone db1157 [[phab:T258361|T258361]]
* 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14212 and previous config saved to /var/cache/conftool/dbconfig/20210205-063554-marostegui.json
* 03:42 aaron@deploy1001: Synchronized wmf-config/mc.php: {{Gerrit|af5b0effb5e88ac4ca4a06c2c409d303ec405305}} (duration: 01m 06s)
* 03:34 aaron@deploy1001: Synchronized php-1.36.0-wmf.27/includes/libs/rdbms: {{Gerrit|4b386661a9820a002b43bfcef3e18241ea883870}} (duration: 01m 12s)
* 02:03 Krinkle: krinkle@mwmaint1002 Prune globalimagelinks references on s4 database for the deleted ukwikimedia wiki, ref [[phab:T218170|T218170]].
* 01:01 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec (duration: 01m 12s)
* 00:59 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@85713c1]: restore data range specifier in extract job partition spec
* 00:36 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw1278.eqiad.wmnet
* 00:35 legoktm: enabled remote IPMI access on  mw1349.mgmt.eqiad.wmnet and  mw1380.mgmt.eqiad.wmnet
* 00:24 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well (duration: 02m 43s)
* 00:21 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@9858513]: transfer_to_es: Wait for link reco, and write to weighted_tags as well


== 2021-02-04 ==
== 2021-05-31 ==
* 23:59 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3 (duration: 01m 06s)
* 07:32 legoktm: deleted all outoing list mail that is for a gmail address being unsubscribed [[phab:T284003|T284003]]
* 23:58 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@93bf374]: correct hql in ores_predictions_init_v3
* 07:30 legoktm: deleted all outoing list mail that is for a yahoo/aol address being unsubscribed [[phab:T284003|T284003]]
* 23:26 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
* 07:23 legoktm: deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" [[phab:T284003|T284003]]
* 23:24 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1278.eqiad.wmnet with reason: REIMAGE
* 06:33 legoktm: manually unsubscribed ahalfaker [at] wikimedia.org from scoring-internal list, triggering mailman bounce loop [[phab:T282348|T282348]]#7124014
* 23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1397.eqiad.wmnet
* 06:22 legoktm: sudo systemctl restart mailman3 on lists1001, bounce runner crashed
* 23:05 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1396.eqiad.wmnet
* 23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1396.eqiad.wmnet
* 23:02 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1397.eqiad.wmnet
* 23:01 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1311.eqiad.wmnet
* 22:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1263.eqiad.wmnet
* 22:39 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1263.eqiad.wmnet
* 22:38 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1311.eqiad.wmnet
* 22:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source (duration: 01m 19s)
* 22:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@700cd49]: partition ores staging tables by data source
* 22:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
* 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 22:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1396.eqiad.wmnet with reason: REIMAGE
* 22:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1397.eqiad.wmnet with reason: REIMAGE
* 21:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1399.eqiad.wmnet
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1398.eqiad.wmnet
* 21:53 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2244.codfw.wmnet
* 21:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
* 21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1399.eqiad.wmnet
* 21:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1398.eqiad.wmnet
* 21:50 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1311.eqiad.wmnet with reason: REIMAGE
* 21:32 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 21:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1263.eqiad.wmnet with reason: REIMAGE
* 21:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
* 21:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
* 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1398.eqiad.wmnet with reason: REIMAGE
* 21:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1399.eqiad.wmnet with reason: REIMAGE
* 21:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1308.eqiad.wmnet
* 21:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1308.eqiad.wmnet
* 20:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1400.eqiad.wmnet
* 20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.codfw.wmnet
* 20:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2267.wmnet
* 20:38 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2244.codfw.wmnet
* 20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1400.eqiad.wmnet
* 20:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2267.wmnet
* 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
* 20:23 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1308.eqiad.wmnet with reason: REIMAGE
* 20:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
* 20:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
* 20:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1400.eqiad.wmnet with reason: REIMAGE
* 20:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2267.codfw.wmnet with reason: REIMAGE
* 19:56 Urbanecm: Purge several recompressed Wikipedia logos
* 19:52 urbanecm@deploy1001: Synchronized logos/config.yaml: Recompress several Wikipedia logos (2/2) (duration: 01m 05s)
* 19:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Recompress several Wikipedia logos (1/2) (duration: 01m 07s)
* 19:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1309.eqiad.wmnet
* 19:35 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|968ae8b69d7f743f0e589ba3568de36bc462c7d6}}: sysop_itwiki: Set wmgUsePopups to false ([[phab:T259480|T259480]]) (duration: 01m 06s)
* 19:33 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
* 19:31 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2244.codfw.wmnet with reason: REIMAGE
* 19:31 urbanecm@deploy1001: Synchronized wmf-config/abusefilter.php: {{Gerrit|a199b8384f4226b70fc00538f01e41a9a68b3ea3}}: abusefilter: enwikibooks: Enable block action ([[phab:T273864|T273864]]) (duration: 01m 06s)
* 19:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|35e6e4014eee7946979fbf6cd782ae90a3612b82}}: Remove ruwiki A/B test for WelcomeSurvey ([[phab:T273900|T273900]]) (duration: 01m 07s)
* 19:18 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|74e7f70c7c8ae4c8ee9589262d088562c7274b98}}: wgAbuseFilterAflFilterMigrationStage: Make READ_NEW in production ([[phab:T269712|T269712]]) (duration: 01m 11s)
* 19:06 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I498a0c4af}} [[phab:T263496|T263496]]"'
* 19:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2278.codfw.wmnet
* 19:00 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1309.eqiad.wmnet
* 18:59 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1401.eqiad.wmnet
* 18:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
* 18:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:45 cdanis: [[phab:T263496|T263496]] deployed {{Gerrit|I498a0c4af}} on cp2027 at 18:29; now deploying on cp3060
* 18:45 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
* 18:28 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕜☕ sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I498a0c4af}} [[phab:T263496|T263496]]"'
* 18:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2278.codfw.wmnet
* 18:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1401.eqiad.wmnet
* 18:05 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert - Migrate PrefUpdate schema to Event Platform on  all wikis - leave on testwiki only, seeing validation errors.  [[phab:T267348|T267348]] (duration: 01m 01s)
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1309.eqiad.wmnet with reason: REIMAGE
* 17:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
* 17:51 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2278.codfw.wmnet with reason: REIMAGE
* 17:51 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on  all wikis - [[phab:T267348|T267348]] (duration: 01m 01s)
* 17:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
* 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1401.eqiad.wmnet with reason: REIMAGE
* 17:42 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; resync) (duration: 01m 07s)
* 17:41 urbanecm@deploy1001: Synchronized logos/config.yaml: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; 2/2) (duration: 01m 07s)
* 17:38 urbanecm@deploy1001: Synchronized wmf-config/logos.php: {{Gerrit|eed3c8e7294d03a62bc71e0a8d9a50044d1edbaa}}: Switch enwiki back to standard logo ([[phab:T272108|T272108]]; 1/2) (duration: 03m 12s)
* 16:46 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Migrate PrefUpdate schema to Event Platform on  testwiki - [[phab:T267348|T267348]] (duration: 01m 08s)
* 16:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
* 16:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
* 16:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2023.codfw.wmnet
* 16:00 moritzm: draining ganeti3002 for eventual reboot
* 15:57 moritzm: failover ganeti master in esams to ganeti3001
* 15:56 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2023.codfw.wmnet
* 15:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2022.codfw.wmnet
* 15:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
* 15:55 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
* 15:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2022.codfw.wmnet
* 15:29 moritzm: draining ganeti3001 for eventual reboot
* 15:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
* 15:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2021.codfw.wmnet
* 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
* 15:20 moritzm: draining ganeti3003 for eventual reboot
* 15:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2021.codfw.wmnet
* 15:11 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2020.codfw.wmnet
* 15:01 jynus@cumin1001: START - Cookbook sre.hosts.decommission
* 14:54 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2020.codfw.wmnet
* 14:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2019.codfw.wmnet
* 14:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2001.codfw.wmnet
* 14:43 jynus: stop db1095 instance in preparation of its decom [[phab:T273732|T273732]]
* 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2001.codfw.wmnet
* 14:38 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 14:37 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2019.codfw.wmnet
* 14:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5002.eqsin.wmnet
* 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
* 14:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5002.eqsin.wmnet
* 14:21 godog: roll-restart rsync/swift-object-replicator in codfw to apply memory limits
* 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4001.ulsfo.wmnet
* 14:18 effie: start rolling reboots of  mc[2019-2027,2029-2037].codfw.wmnet [[phab:T273278|T273278]]
* 14:16 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@47fc426]: (no justification provided) (duration: 00m 12s)
* 14:16 mbsantos@deploy1001: Started deploy [kartotherian/deploy@47fc426]: (no justification provided)
* 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4001.ulsfo.wmnet
* 14:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4002.ulsfo.wmnet
* 14:14 moritzm: installing ffmpeg security updates on stretch
* 14:11 mbsantos@deploy1001: Finished deploy [kartotherian/deploy@0a38bc5]: (no justification provided) (duration: 00m 03s)
* 14:11 mbsantos@deploy1001: Started deploy [kartotherian/deploy@0a38bc5]: (no justification provided)
* 14:10 mbsantos@deploy1001: Finished deploy [tilerator/deploy@46a2eaf]: (no justification provided) (duration: 00m 13s)
* 14:10 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf]: (no justification provided)
* 14:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir4002.ulsfo.wmnet
* 14:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5001.eqsin.wmnet
* 13:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: NO-OP: {{Gerrit|7c67b2f03cbc27cf9e5f214a6f0ea0856d8c1ae4}}: bnwiki: wgGEHelpPanelLinks: Remove text in brackets ([[phab:T266020|T266020]]) (duration: 01m 12s)
* 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5001.eqsin.wmnet
* 13:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir5002.eqsin.wmnet
* 13:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.reboot-single for host ncredir5002.eqsin.wmnet
* 13:44 vgutierrez: rolling restart of ncredir instances (kernel upgrade)
* 13:36 moritzm: installing openldap security updates on buster (client-side tools/libs only, slapd instance already updated)
* 13:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1003.eqiad.wmnet
* 13:31 godog: reboot logstash2005.codfw.wmnet, no ssh / stuck
* 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
* 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwdebug1003.eqiad.wmnet
* 13:10 jbond42: upload cas_6.2.7 to downgrade cas [[phab:T273867|T273867]]
* 13:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
* 13:02 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: REIMAGE
* 12:27 moritzm: installing libdatetime-timezone-perl updates on Buster
* 12:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: reboot
* 12:17 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 17 hosts with reason: reboot
* 12:17 moritzm: rebooting mw[1264-1268,1276-1277,1337-1338,1404-1409,1411,1413].eqiad.wmnet for kernel update
* 12:08 godog: bounce rsyslog on centrallog1001
* 11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian,name=maps1009.eqiad.wmnet
* 11:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=maps,service=kartotherian-ssl,name=maps1009.eqiad.wmnet
* 11:30 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 11:26 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
* 11:07 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal
* 10:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 93 hosts with reason: reboot
* 10:35 moritzm: rebooting mw[2261-2262,2268-2271,2273-2277,2283-2288,2290-2335,2337-2339,2350-2376].codfw.wmnet
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 93 hosts with reason: reboot
* 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14204 and previous config saved to /var/cache/conftool/dbconfig/20210204-102312-root.json
* 10:15 elukey: restart pybal on lvs1015 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 10:13 elukey: restart pybal on lvs2009 (low-traffic active) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 10:08 elukey: restart pybal on lvs1016 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14203 and previous config saved to /var/cache/conftool/dbconfig/20210204-100808-root.json
* 10:05 elukey: restart pybal on lvs2010 (low-traffic standby) to pick up new changes for eventstreams-internal (new VIP) - [[phab:T269160|T269160]]
* 09:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 60%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14202 and previous config saved to /var/cache/conftool/dbconfig/20210204-095305-root.json
* 09:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 09:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 37 hosts with reason: reboot
* 09:44 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 4:00:00 on 37 hosts with reason: reboot
* 09:41 moritzm: rebooting mw[2215-2219,2221-2243,2246-2249,2251-2253,2255,2258 for kernel update
* 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14201 and previous config saved to /var/cache/conftool/dbconfig/20210204-093801-root.json
* 09:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flowspec1001.eqiad.wmnet
* 09:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host flowspec1001.eqiad.wmnet
* 09:24 XioNoX: re-enable ping offload in esams - [[phab:T273278|T273278]]
* 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1078 from dbctl [[phab:T273597|T273597]]', diff saved to https://phabricator.wikimedia.org/P14199 and previous config saved to /var/cache/conftool/dbconfig/20210204-092414-marostegui.json
* 09:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping3001.esams.wmnet
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 30%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14198 and previous config saved to /var/cache/conftool/dbconfig/20210204-092257-root.json
* 09:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping3001.esams.wmnet
* 09:17 XioNoX: disable ping offload in esams (eqiad re-enabled) - [[phab:T273278|T273278]]
* 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1001.eqiad.wmnet
* 09:15 godog: roll restart lvs low-traffic in codfw/eqiad for swift healthcheck updates
* 09:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping1001.eqiad.wmnet
* 09:10 XioNoX: disable ping offload in eqiad (codfw-re-enabled) - [[phab:T273278|T273278]]
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14197 and previous config saved to /var/cache/conftool/dbconfig/20210204-090754-root.json
* 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2001.codfw.wmnet
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ping2001.codfw.wmnet
* 09:02 XioNoX: disable ping offload in codfw - [[phab:T273278|T273278]]
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 20%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14196 and previous config saved to /var/cache/conftool/dbconfig/20210204-085250-root.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 15%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14195 and previous config saved to /var/cache/conftool/dbconfig/20210204-083747-root.json
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 08:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 12%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14194 and previous config saved to /var/cache/conftool/dbconfig/20210204-082243-root.json
* 08:22 moritzm: reset failed ifup@ens5 on xhgui2001/xhgui1001 [[phab:T273026|T273026]]
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14193 and previous config saved to /var/cache/conftool/dbconfig/20210204-081605-root.json
* 08:10 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
* 08:08 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1009.eqiad.wmnet with reason: REIMAGE
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14192 and previous config saved to /var/cache/conftool/dbconfig/20210204-080740-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14191 and previous config saved to /var/cache/conftool/dbconfig/20210204-080101-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 7%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14190 and previous config saved to /var/cache/conftool/dbconfig/20210204-075236-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14189 and previous config saved to /var/cache/conftool/dbconfig/20210204-074558-root.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 5%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14188 and previous config saved to /var/cache/conftool/dbconfig/20210204-073733-root.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14187 and previous config saved to /var/cache/conftool/dbconfig/20210204-073054-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 3%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14186 and previous config saved to /var/cache/conftool/dbconfig/20210204-072229-root.json
* 07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1117.eqiad.wmnet
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 20%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14185 and previous config saved to /var/cache/conftool/dbconfig/20210204-071551-root.json
* 07:13 elukey@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1117.eqiad.wmnet
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 2%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14184 and previous config saved to /var/cache/conftool/dbconfig/20210204-070726-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14183 and previous config saved to /var/cache/conftool/dbconfig/20210204-070047-root.json
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 5%: Repool db1137 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14182 and previous config saved to /var/cache/conftool/dbconfig/20210204-064544-root.json
* 06:42 marostegui: Restart mysql on db1137
* 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14181 and previous config saved to /var/cache/conftool/dbconfig/20210204-064157-marostegui.json
* 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 1%: Slowly pooling db1173 for the first time in s6', diff saved to https://phabricator.wikimedia.org/P14180 and previous config saved to /var/cache/conftool/dbconfig/20210204-063033-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1173 to dbctl - depooled [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14179 and previous config saved to /var/cache/conftool/dbconfig/20210204-062836-marostegui.json
* 02:02 legoktm@deploy1001: Synchronized logos/config.yaml: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (2/2) (duration: 01m 06s)
* 02:00 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nowiki, cawiki, fiwiki, ukwiki, cswiki, huwiki, trwiki (1/2) (duration: 01m 10s)
* 01:15 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours (duration: 01m 16s)
* 01:13 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@4b4872d]: transfer_to_es: Increase timeout waiting for source data to three hours
* 01:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1310.eqiad.wmnet
* 00:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet
* 00:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1310.eqiad.wmnet
* 00:44 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1318.eqiad.wmnet
* 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2279.codfw.wmnet
* 00:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet
* 00:17 eileen: civicrm revision changed from {{Gerrit|dfb2ea2148}} to {{Gerrit|1e9a86dd6e}}, config revision is {{Gerrit|01ea3062f4}}
* 00:12 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2279.codw.wmnet
* 00:11 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet
* 00:05 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE
* 00:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1310.eqiad.wmnet with reason: REIMAGE


== 2021-02-03 ==
== 2021-05-29 ==
* 23:59 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
* 14:44 elukey: execute apt-get clean on an-airflow1001 to free space
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1318.eqiad.wmnet with reason: REIMAGE
* 14:40 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet
* 23:51 mutante: installservers: replacing squid proxy logrotate cron with systemd timer
* 23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2279.codfw.wmnet with reason: REIMAGE
* 23:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2280.codfw.wmnet with reason: REIMAGE
* 23:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2279.codfw.wmnet with reason: REIMAGE
* 23:46 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2280.codfw.wmnet with reason: REIMAGE
* 22:53 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 22:06 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox1001.wikimedia.org
* 21:53 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox1001.wikimedia.org
* 21:53 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1001.eqiad.wmnet
* 21:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netboxdb1001.eqiad.wmnet
* 21:44 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2001.wikimedia.org
* 21:40 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netbox2001.wikimedia.org
* 21:39 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2001.codfw.wmnet
* 21:34 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single for host netboxdb2001.codfw.wmnet
* 21:33 chaomodus: rebooting Netbox cluster
* 21:05 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 20:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:03 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1334.eqiad.wmnet
* 20:01 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1334.eqiad.wmnet
* 19:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2281.codfw.wmnet
* 19:31 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2282.codfw.wmnet
* 19:21 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2281.codfw.wmnet
* 19:17 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2282.codfw.wmnet
* 19:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1334.eqiad.wmnet with reason: REIMAGE
* 19:10 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1334.eqiad.wmnet with reason: REIMAGE
* 19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|56351f0434be36f4a639f98986d7785dd4d0b14d}}: kowiki: Fix wgGEHelpPanelHelpDeskTitle ([[phab:T273799|T273799]]) (duration: 01m 10s)
* 18:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2282.codfw.wmnet with reason: REIMAGE
* 18:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2281.codfw.wmnet with reason: REIMAGE
* 18:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2282.codfw.wmnet with reason: REIMAGE
* 18:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2281.codfw.wmnet with reason: REIMAGE
* 18:32 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test cluster: Reboot kafka nodes - razzi@cumin1001
* 18:26 dduvall@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:23 dduvall@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 18:13 dduvall@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 17:01 elukey@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal
* 16:44 mbsantos@deploy1001: deploy aborted: (no justification provided) (duration: 00m 00s)
* 16:44 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (imposm): (no justification provided)
* 16:44 mbsantos@deploy1001: deploy aborted: (no justification provided) (duration: 00m 01s)
* 16:44 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (beta): (no justification provided)
* 16:37 mbsantos@deploy1001: deploy aborted: Deploy Tilerator build for buster machines (duration: 00m 03s)
* 16:37 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (imposm): Deploy Tilerator build for buster machines
* 16:37 mbsantos@deploy1001: deploy aborted: imposm Deploy Tilerator build for buster machines (duration: 00m 03s)
* 16:37 mbsantos@deploy1001: Started deploy [tilerator/deploy@46a2eaf] (nvironment): imposm Deploy Tilerator build for buster machines
* 16:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2001.codfw.wmnet
* 16:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host peek2001.codfw.wmnet
* 16:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host people2001.codfw.wmnet
* 16:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
* 16:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host peek2001.codfw.wmnet
* 16:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet1002.eqiad.wmnet
* 16:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
* 16:28 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host planet1002.eqiad.wmnet
* 16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host planet2002.codfw.wmnet
* 16:26 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4002.ulsfo.wmnet
* 16:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host planet2002.codfw.wmnet
* 16:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1002.eqiad.wmnet
* 16:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb1002.eqiad.wmnet
* 16:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4002.ulsfo.wmnet
* 16:16 moritzm: draining ganeti4002 for eventual reboot
* 16:13 moritzm: failover ganeti master in ulsfo to ganeti4003
* 16:13 volans: enabled puppet on install1003 after the test [[phab:T221388|T221388]]
* 16:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4003.ulsfo.wmnet
* 16:08 elukey@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal
* 16:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4003.ulsfo.wmnet
* 16:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labweb1002.wikimedia.org
* 16:00 moritzm: draining ganeti4003 for eventual reboot
* 15:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host labweb1002.wikimedia.org
* 15:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host labweb1001.wikimedia.org
* 15:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti4001.ulsfo.wmnet
* 15:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host labweb1001.wikimedia.org
* 15:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
* 15:49 moritzm: draining ganeti4001 for eventual reboot
* 15:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
* 15:46 hnowlan: one-off installing imposm3 on maps1009
* 15:32 volans: disabling puppet on install1003 for a quick test for [[phab:T221388|T221388]]
* 15:18 moritzm: installing ca-certificates update for buster (reverting the Symantec CA blacklist, related to GeoTrust CA)
* 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14171 and previous config saved to /var/cache/conftool/dbconfig/20210203-150411-root.json
* 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14170 and previous config saved to /var/cache/conftool/dbconfig/20210203-144908-root.json
* 14:39 akosiaris@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=linkrecommendation
* 14:38 akosiaris@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=similar-users
* 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14169 and previous config saved to /var/cache/conftool/dbconfig/20210203-143404-root.json
* 14:20 moritzm: installing openldap security updates on serpens/seaborgium
* 14:19 godog: test memory limits on swift-object-replicator on ms-be2050 - [[phab:T221904|T221904]]
* 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14168 and previous config saved to /var/cache/conftool/dbconfig/20210203-141901-root.json
* 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 20%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14167 and previous config saved to /var/cache/conftool/dbconfig/20210203-140357-root.json
* 13:58 godog: swift codfw-prod decrease HDD weight for ms-be20[16-27] - [[phab:T272837|T272837]]
* 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14166 and previous config saved to /var/cache/conftool/dbconfig/20210203-134854-root.json
* 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repool db1120 after daemon restart', diff saved to https://phabricator.wikimedia.org/P14165 and previous config saved to /var/cache/conftool/dbconfig/20210203-133350-root.json
* 13:30 marostegui: Stop mysql on db1120 to enable report_host [[phab:T266483|T266483]]
* 13:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui1001.eqiad.wmnet
* 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P14164 and previous config saved to /var/cache/conftool/dbconfig/20210203-132938-marostegui.json
* 13:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host xhgui2001.codfw.wmnet
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host xhgui1001.eqiad.wmnet
* 13:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host xhgui2001.codfw.wmnet
* 13:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2001.wikimedia.org
* 13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1001.wikimedia.org
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader2001.wikimedia.org
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host urldownloader1001.wikimedia.org
* 12:46 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster2001.codfw.wmnet
* 12:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2002.codfw.wmnet
* 12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster2003.codfw.wmnet
* 12:35 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2001.codfw.wmnet
* 12:35 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetmaster1001.eqiad.wmnet
* 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetdb2002.codfw.wmnet
* 12:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2002.codfw.wmnet
* 12:34 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster2003.codfw.wmnet
* 12:34 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
* 12:33 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1003.eqiad.wmnet
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1003.eqiad.wmnet
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
* 12:29 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetmaster1001.eqiad.wmnet
* 12:28 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host puppetdb1002.eqiad.wmnet
* 12:26 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb2002.codfw.wmnet
* 12:25 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host puppetdb1002.eqiad.wmnet
* 12:22 jbond42: disable puppet fleet wide to reboot puppetmaster,puppetdb
* 12:19 moritzm: installing openldap security updates on LDAP replicas
* 11:20 jbond42: update puppetlabs-stdlib to v6.6.0
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14163 and previous config saved to /var/cache/conftool/dbconfig/20210203-110236-root.json
* 10:54 elukey@deploy1001: Finished deploy [analytics/refinery@8b8f0cf]: Weekly deployment (duration: 11m 06s)
* 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 100%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14162 and previous config saved to /var/cache/conftool/dbconfig/20210203-105057-root.json
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 85%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14161 and previous config saved to /var/cache/conftool/dbconfig/20210203-104733-root.json
* 10:43 elukey@deploy1001: Started deploy [analytics/refinery@8b8f0cf]: Weekly deployment
* 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 75%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14160 and previous config saved to /var/cache/conftool/dbconfig/20210203-103554-root.json
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14159 and previous config saved to /var/cache/conftool/dbconfig/20210203-103229-root.json
* 10:28 vgutierrez: rolling restart of varnish-fe on cp5002 and cp5003
* 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 50%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14158 and previous config saved to /var/cache/conftool/dbconfig/20210203-102050-root.json
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 60%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14157 and previous config saved to /var/cache/conftool/dbconfig/20210203-101726-root.json
* 10:16 legoktm: re-enabled puppet on mw2295 ([[phab:T273726|T273726]])
* 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 25%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14156 and previous config saved to /var/cache/conftool/dbconfig/20210203-100547-root.json
* 10:05 gehel: depooling and restarting blazegraph on wdqs1007
* 10:04 hashar@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Echo/includes/api/ApiEchoUnreadNotificationPages.php: Add missing isset() check to ApiEchoUnreadNotificationPages - [[phab:T273479|T273479]] (duration: 01m 14s)
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14155 and previous config saved to /var/cache/conftool/dbconfig/20210203-100222-root.json
* 09:57 marostegui: m2 master restart - [[phab:T272964|T272964]]
* 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 10%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14154 and previous config saved to /var/cache/conftool/dbconfig/20210203-095043-root.json
* 09:50 XioNoX: disable DE-CIX codfw peering sessions
* 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 40%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14153 and previous config saved to /var/cache/conftool/dbconfig/20210203-094719-root.json
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1093 (re)pooling @ 5%: Slowly pooling db1093 after cloning db1173', diff saved to https://phabricator.wikimedia.org/P14152 and previous config saved to /var/cache/conftool/dbconfig/20210203-093540-root.json
* 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 30%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14151 and previous config saved to /var/cache/conftool/dbconfig/20210203-093215-root.json
* 09:30 vgutierrez: depool cp5006
* 09:26 vgutierrez: rolling restart varnish-fe on cp5004-5006
* 09:20 _joe_: restarting varnish-frontend on cp5001
* 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14150 and previous config saved to /var/cache/conftool/dbconfig/20210203-091712-root.json
* 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 20%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14149 and previous config saved to /var/cache/conftool/dbconfig/20210203-090208-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 15%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14148 and previous config saved to /var/cache/conftool/dbconfig/20210203-084705-root.json
* 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 13%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14147 and previous config saved to /var/cache/conftool/dbconfig/20210203-083201-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14146 and previous config saved to /var/cache/conftool/dbconfig/20210203-081658-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 8%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14145 and previous config saved to /var/cache/conftool/dbconfig/20210203-080154-root.json
* 07:49 marostegui: Stop mysql on db1093 to clone db1173 [[phab:T258361|T258361]]
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 to clone db1173 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P14143 and previous config saved to /var/cache/conftool/dbconfig/20210203-074749-marostegui.json
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Slowly pool db1174 into s7', diff saved to https://phabricator.wikimedia.org/P14142 and previous config saved to /var/cache/conftool/dbconfig/20210203-074651-root.json
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Give some more weight to db1174', diff saved to https://phabricator.wikimedia.org/P14141 and previous config saved to /var/cache/conftool/dbconfig/20210203-071310-marostegui.json
* 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078 - will be decommissioned', diff saved to https://phabricator.wikimedia.org/P14139 and previous config saved to /var/cache/conftool/dbconfig/20210203-064137-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1174 with minimal weight for the first time in s7', diff saved to https://phabricator.wikimedia.org/P14138 and previous config saved to /var/cache/conftool/dbconfig/20210203-063812-marostegui.json
* 00:16 jhuneidi@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 00:13 legoktm@deploy1001: Synchronized logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (2/2) (duration: 01m 05s)
* 00:12 legoktm@deploy1001: Synchronized static/images/project-logos/: Update and recompress logos for nlwiki, eswiki, ptwiki, ruwiki, svwiki, zhwiki (1/2) (duration: 01m 10s)
* 00:10 jhuneidi@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .


== 2021-02-02 ==
== 2021-05-28 ==
* 23:53 mutante: mw1300 - scap pull (it crashed earlier put is back after powercycling)
* 08:06 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: name=wdqs1003.eqiad.wmnet,dc=eqiad
* 23:52 jhuneidi@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 08:02 elukey: restart blazegraph on wdqs1011
* 23:30 mutante: powercycling crashed m1300.eqiad.wmnet
* 01:43 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:696736{{!}}ExtensionDistributor: REL1_36 is now the stable release (T279455)]] (duration: 00m 57s)
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1335.eqiad.wmnet
* 21:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1336.eqiad.wmnet
* 21:56 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1335.eqiad.wmnet
* 21:55 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1336.eqiad.wmnet
* 21:09 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 21:08 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 21:06 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1335.eqiad.wmnet with reason: REIMAGE
* 21:05 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1336.eqiad.wmnet with reason: REIMAGE
* 20:12 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'  # test on cp2027 looks good, perhaps slightly-increased Varnish CPU consumption but hard to be sure
* 20:00 Lucas_WMDE: Morning backport window done
* 19:58 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/WikibaseMediaInfo/: Backport: [[gerrit:661092{{!}}Pass $databaseName into WikiPageEntityDataLoader (T273622)]] (duration: 01m 07s)
* 19:57 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.29/extensions/Wikibase/: Backport: [[gerrit:661091{{!}}Add wiki ID to WikiPageEntityDataLoader (T273622)]] (duration: 01m 25s)
* 19:52 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ 🕒☕ sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I7003b7b6}} and {{Gerrit|Idd0e124f5}} [[phab:T263496|T263496]]"'
* 19:00 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:48 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:43 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 18:23 milimetric@deploy1001: Finished deploy [analytics/turnilo/deploy@052348b]: (no justification provided) (duration: 00m 03s)
* 18:23 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 18:22 milimetric@deploy1001: deploy aborted: (no justification provided) (duration: 00m 10s)
* 18:22 milimetric@deploy1001: Started deploy [analytics/turnilo/deploy@052348b]: (no justification provided)
* 18:17 mbsantos@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:07 mbsantos@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:03 mbsantos@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth2001.codfw.wmnet
* 16:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host auth1002.eqiad.wmnet
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth1002.eqiad.wmnet
* 16:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host auth2001.codfw.wmnet
* 15:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
* 15:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
* 15:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 100%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14135 and previous config saved to /var/cache/conftool/dbconfig/20210202-143950-root.json
* 14:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1001.eqiad.wmnet
* 14:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host failoid1001.eqiad.wmnet
* 14:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
* 14:35 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2003.codfw.wmnet
* 14:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
* 14:26 hashar@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.29 (duration: 73m 10s)
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
* 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 75%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14134 and previous config saved to /var/cache/conftool/dbconfig/20210202-142446-root.json
* 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
* 14:21 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd2003.codfw.wmnet
* 14:12 klausman@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-etcd2002.codfw.wmnet
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1094 (re)pooling @ 50%: Repool db1094 after cloning another host', diff saved to https://phabricator.wikimedia.org/P14133 and previous config saved to /var/cache/conftool/dbconfig/20210202-140943-root.json