You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(81 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 2021-05-02 ==
== 2021-08-03 ==
* 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
 
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
== 2021-05-01 ==
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 Urbanecm: Invalidate password for MaraBot@SUL ([[phab:T281586|T281586]])
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos ([[phab:T280908|T280908]]) (duration: 00m 56s)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2021-04-30 ==
== 2021-08-02 ==
* 21:54 mutante: people1003 - rsycncing /home from peopel1002
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:30 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 15:29 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 15:25 bstorm: hard rebooting cloudmetrics1002 [[phab:T275605|T275605]]
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:40 ladsgroup@deploy1002: Synchronized static/favicon/wikitech.ico: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 11:36 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-1.5x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 56s)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:34 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech-2x.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 ladsgroup@deploy1002: Synchronized static/images/project-logos/wikitech.png: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:31 ladsgroup@deploy1002: Synchronized logos/config.yaml: Config: [[gerrit:683835{{!}}Update wikitech logo]] (duration: 00m 57s)
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 09:04 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 09:03 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: primary nic disconnected
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 08:11 moritzm: remove mc1027 from debmonitor, server is broken and won't return ([[phab:T276415|T276415]])
* 21:31 tzatziki: removing 1 file for legal compliance
* 07:38 moritzm: installing iputils updates from Buster point release
* 21:16 tzatziki: removing 7 files for legal compliance
* 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15667 and previous config saved to /var/cache/conftool/dbconfig/20210430-061549-root.json
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15666 and previous config saved to /var/cache/conftool/dbconfig/20210430-060046-root.json
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:51 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15665 and previous config saved to /var/cache/conftool/dbconfig/20210430-054542-root.json
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P15664 and previous config saved to /var/cache/conftool/dbconfig/20210430-053038-root.json
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 05:16 marostegui: Upgrade kernel on db1114
* 19:00 urbanecm: Morning B&C window completed
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to enable report_host [[phab:T266483|T266483]]', diff saved to https://phabricator.wikimedia.org/P15663 and previous config saved to /var/cache/conftool/dbconfig/20210430-051558-marostegui.json
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 05:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1080.eqiad.wmnet
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1080.eqiad.wmnet
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo systemctl restart wdqs-blazegraph`
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:43 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 04:43 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:42 ryankemper: [[phab:T261239|T261239]] `elastic2033`, which is known to be in a state of hardware failure (we have a ticket open), is holding up the reboot of codfw. I don't think we have a good way to exclude a node currently. Going to just proceed to `eqiad` for now
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 04:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 04:39 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 04:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 04:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 04:03 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: REIMAGE
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 03:50 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1010.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 03:47 ryankemper: [[phab:T280563|T280563]] about half of codfw nodes have been rebooted before the failure caused by write queue not emptying fast enough, kicking it off again:`sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:45 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 01:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2021-04-29 ==
== 2021-07-31 ==
* 23:36 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683749{{!}}Revert "DEMO: Add newline to README"]] (duration: 00m 56s)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 23:18 ryankemper: [[phab:T280563|T280563]] successful reboot of `relforge100[3,4]`; `relforge` cluster is back to green status.
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 23:16 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:683747{{!}}DEMO: Add newline to README]] (duration: 00m 56s)
* 23:08 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts` (amended command)
* 23:06 ryankemper: [[phab:T280563|T280563]] `sudo -i cookbook sre.elasticsearch.rolling-operation codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id [[phab:T280563|T280563]]` on `ryankemper@cumin1001` tmux session `elastic_restarts`
* 23:05 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:46 ryankemper: [[phab:T280563|T280563]] Current master is `relforge1003-relforge-eqiad`, will reboot `1004` first then `1003` after
* 22:44 ryankemper: [[phab:T280563|T280563]] Bleh, we never moved the new config into spicerack, so it's trying to talk to the old relforge hosts which no longer exist. Will reboot relforge manually and use the cookbook for codfw/eqiad, and circle back later for the spicerack change
* 22:37 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:32 ryankemper: [[phab:T280563|T280563]] Spotted the issue; forgot to set `--without-lvs` for relforge reboot
* 22:27 ryankemper: [[phab:T280563|T280563]] `urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbe4bb8a518>: Failed to establish a new connection: [Errno -2] Name or service not known`
* 22:26 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:26 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge restart - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 22:20 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (2 nodes at a time) for ElasticSearch cluster relforge: relforge reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 21:36 mutante: icinga - enabling disabled notifications for random an-worker nodes where mgmt interface had enabled alerts but the actual host didnt
* 21:32 mutante: icinga - enabled notifications for checks on ms-backup1001 - they were all manually disabled but none of the checks had any status change since 50 days which indicates it was forgotten to turn them back on which is a common issue with disabling notifications
* 21:16 mutante: backup1001 - sudo check_bacula.py --icinga
* 20:54 marostegui: Stop mysql on tendril for the UTC night, dbtree and tendrill will remain down for a few hours [[phab:T281486|T281486]]
* 20:16 marostegui: Restart tendril database - [[phab:T281486|T281486]]
* 20:00 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:46 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 08s)
* 19:45 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 19:32 dpifke@deploy1002: Finished deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484 (duration: 00m 05s)
* 19:32 dpifke@deploy1002: Started deploy [performance/navtiming@e7ad939]: Deploy https://gerrit.wikimedia.org/r/c/performance/navtiming/+/683484
* 19:01 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki/wanobjectcache/revision_row_1/ (bad data from Sep 2019)
* 18:59 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/rl-minify-* (bad data from Aug 2018)
* 18:58 Krinkle: graphite1004/2003: prune /var/lib/carbon/whisper/MediaWiki_ExternalGuidance_init_Google_tr_fr (bad data from Nov 2019)
* 18:38 krinkle@deploy1002: Synchronized php-1.37.0-wmf.1/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 08s)
* 18:33 mutante: LDAP - added mmandere to wmf group ([[phab:T281344|T281344]])
* 18:10 krinkle@deploy1002: Synchronized php-1.37.0-wmf.3/includes/libs/objectcache/MemcachedBagOStuff.php: {{Gerrit|I926797a9d494a31}}, [[phab:T281480|T281480]] (duration: 01m 09s)
* 17:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 17:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:29 ryankemper: [[phab:T281498|T281498]] `sudo -E cumin 'C:role::lvs::balancer' 'sudo run-puppet-agent'`
* 16:28 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 16:27 liw@deploy1002: sync-wikiversions aborted: Revert "group[0{{!}}1] wikis to [VERSION]" (duration: 00m 01s)
* 16:22 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo depool`
* 16:20 ryankemper: [[phab:T281498|T281498]] `ryankemper@wdqs2004:~$ sudo run-puppet-agent`
* 16:18 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 39s)
* 16:15 otto@deploy1002: Started deploy [analytics/refinery@b3c5820] (hadoop-test): update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 16:12 papaul: powerdown thanos-fe2001 for memory swap
* 15:44 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (trying reimaging this host one final time, if this fails again will need to do a deeper investigation into what's going wrong here)
* 15:43 ryankemper: [WDQS] `wdqs2001` is high on update lag but otherwise functioning; will repool when lag is caught up
* 15:37 ryankemper: [WDQS] `sudo systemctl restart wdqs-blazegraph` && `sudo systemctl restart wdqs-updater` on `wdqs2001`
* 15:35 ryankemper: [WDQS] ^ scratch that, depooled `wdqs2001`
* 15:34 ryankemper: [WDQS] pooled `wdqs2001`
* 14:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 14:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog[1002-1003].eqiad.wmnet with reason: eventlog1003 migration
* 13:44 moritzm: installing Java security updates on stat* hosts
* 13:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1003.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: eventlog1003 migration
* 13:40 otto@deploy1002: Finished deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]] (duration: 02m 59s)
* 13:37 otto@deploy1002: Started deploy [analytics/refinery@b3c5820]: update event_sanitized_main allowlst on an-launcher1002 - [[phab:T273789|T273789]]
* 13:11 moritzm: installing postgresql-11 security updates
* 13:08 jbond42: merge netbase change to manage /etc/services
* 13:07 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:06 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 12:36 Amir1: upgrading Quiddity to admin in mailman3
* 12:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on eventlog1002.eqiad.wmnet with reason: Testing migration of processors to eventlog1003
* 12:26 moritzm: installing grub2 updates from buster point release
* 12:06 jbond42: update debmonitor.discover.wmnet ssl cert
* 11:59 ladsgroup@deploy1002: Synchronized wmf-config/extension-list: Config: [[gerrit:683454{{!}}Undeploy JADE from production, Part III (T281418)]] (duration: 01m 07s)
* 11:54 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683453{{!}}Undeploy JADE from production, Part II (T281418)]], Part I (duration: 01m 06s)
* 11:49 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683452{{!}}Undeploy JADE from production, Part I (T281418)]] (duration: 01m 07s)
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 11:38 mbsantos@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:683548{{!}}Enable suggested values in TemplateData and VisualEditor CommonSettings (T273857)]] (duration: 01m 07s)
* 11:34 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683534{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 07s)
* 11:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/specials/SpecialContentTranslation.php: Backport: [[gerrit:683533{{!}}Another fix for token cookie handling (T281346)]] (duration: 01m 08s)
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15658 and previous config saved to /var/cache/conftool/dbconfig/20210429-113211-root.json
* 11:24 mbsantos@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:683547{{!}}Enable suggested values in TemplateData and VisualEditor InitialiseSettings (T273857)]] (duration: 01m 07s)
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15657 and previous config saved to /var/cache/conftool/dbconfig/20210429-111708-root.json
* 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15656 and previous config saved to /var/cache/conftool/dbconfig/20210429-110204-root.json
* 10:59 moritzm: updating apt on buster (SUA 198), which eases bullseye upgrades [[phab:T275873|T275873]]
* 10:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683135{{!}}Fix CX token cookie (T281346)]] (duration: 01m 08s)
* 10:54 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/ContentTranslation/modules/base/mw.cx.SiteMapper.js: Backport: [[gerrit:683134{{!}}Fix CX token cookie (T281346)]] (duration: 01m 09s)
* 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15655 and previous config saved to /var/cache/conftool/dbconfig/20210429-104700-root.json
* 10:27 marostegui: Upgrade kernel on db1110
* 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15654 and previous config saved to /var/cache/conftool/dbconfig/20210429-102447-marostegui.json
* 09:42 volans: uploaded pynetbox 5.3.0-2 to bullseye-wikimedia on qpt.w.o
* 09:39 volans@deploy1002: Finished deploy [homer/deploy@e394769]: Release v0.2.8 (duration: 03m 30s)
* 09:35 volans@deploy1002: Started deploy [homer/deploy@e394769]: Release v0.2.8
* 09:01 jynus: stop replication and checking data of db2100:s7
* 08:57 marostegui: Upgrade kernel on db2133
* 08:51 marostegui: Upgrade kernel on db2125
* 08:50 marostegui: Upgrade kernel on db2124
* 08:46 marostegui: Upgrade kernel on db2122
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 100%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15652 and previous config saved to /var/cache/conftool/dbconfig/20210429-084011-root.json
* 08:39 marostegui: Upgrade kernel on db2121
* 08:33 marostegui: Upgrade kernel on db2120
* 08:28 volans@deploy1002: Finished deploy [homer/deploy@89cd07c]: Release v0.2.7 (duration: 03m 08s)
* 08:27 marostegui: Upgrade kernel on db2115
* 08:25 volans@deploy1002: Started deploy [homer/deploy@89cd07c]: Release v0.2.7
* 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 80%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15651 and previous config saved to /var/cache/conftool/dbconfig/20210429-082507-root.json
* 08:19 marostegui: Upgrade kernel on db2114
* 08:12 marostegui: Upgrade kernel on db2109
* 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 70%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15649 and previous config saved to /var/cache/conftool/dbconfig/20210429-081004-root.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 60%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15648 and previous config saved to /var/cache/conftool/dbconfig/20210429-075500-root.json
* 07:54 marostegui: Upgrade kernel on db2089
* 07:48 jynus: rolling restart of bacula hosts [[phab:T273182|T273182]]
* 07:48 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 (duration: 01m 07s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15647 and previous config saved to /var/cache/conftool/dbconfig/20210429-074625-root.json
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 50%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15646 and previous config saved to /var/cache/conftool/dbconfig/20210429-073956-root.json
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 90%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15645 and previous config saved to /var/cache/conftool/dbconfig/20210429-073122-root.json
* 07:28 marostegui: Stop mysql and upgrade kernel on pc1007
* 07:28 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Depool pc1007 (duration: 01m 08s)
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 40%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15644 and previous config saved to /var/cache/conftool/dbconfig/20210429-072453-root.json
* 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 80%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15643 and previous config saved to /var/cache/conftool/dbconfig/20210429-071618-root.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 25%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15642 and previous config saved to /var/cache/conftool/dbconfig/20210429-070949-root.json
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15641 and previous config saved to /var/cache/conftool/dbconfig/20210429-070114-root.json
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1083 (re)pooling @ 10%: Repool db1083', diff saved to https://phabricator.wikimedia.org/P15640 and previous config saved to /var/cache/conftool/dbconfig/20210429-065445-root.json
* 06:53 godog: add 100G to prometheus/ops in eqiad
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 60%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15639 and previous config saved to /var/cache/conftool/dbconfig/20210429-064611-root.json
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15637 and previous config saved to /var/cache/conftool/dbconfig/20210429-063107-root.json
* 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 40%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15636 and previous config saved to /var/cache/conftool/dbconfig/20210429-061603-root.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 30%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15635 and previous config saved to /var/cache/conftool/dbconfig/20210429-060100-root.json
* 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15634 and previous config saved to /var/cache/conftool/dbconfig/20210429-054556-root.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 20%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15633 and previous config saved to /var/cache/conftool/dbconfig/20210429-053052-root.json
* 05:22 marostegui: Check tables on db1121 (this will cause lag on s4 commonswiki, on wikireplicas)
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 for tables checking', diff saved to https://phabricator.wikimedia.org/P15632 and previous config saved to /var/cache/conftool/dbconfig/20210429-052146-marostegui.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 15%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15631 and previous config saved to /var/cache/conftool/dbconfig/20210429-051549-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Slowly pool into s2 db1156', diff saved to https://phabricator.wikimedia.org/P15630 and previous config saved to /var/cache/conftool/dbconfig/20210429-050045-root.json
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15629 and previous config saved to /var/cache/conftool/dbconfig/20210429-045557-marostegui.json
* 04:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15627 and previous config saved to /var/cache/conftool/dbconfig/20210429-045015-marostegui.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15626 and previous config saved to /var/cache/conftool/dbconfig/20210429-044458-marostegui.json
* 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1118.eqiad.wmnet with reason: REIMAGE
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1156 into s2 for the first time with minimal weight [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15625 and previous config saved to /var/cache/conftool/dbconfig/20210429-043857-marostegui.json
* 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1156 to dbctl [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15624 and previous config saved to /var/cache/conftool/dbconfig/20210429-043812-marostegui.json
* 04:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 for reimage', diff saved to https://phabricator.wikimedia.org/P15623 and previous config saved to /var/cache/conftool/dbconfig/20210429-042757-marostegui.json
* 02:59 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job (duration: 00m 06s)
* 02:59 milimetric@deploy1002: Started deploy [analytics/refinery@740226b] (thin): Hotfix for referrer job
* 02:58 milimetric@deploy1002: Finished deploy [analytics/refinery@740226b]: Hotfix for referrer job (duration: 14m 40s)
* 02:44 milimetric@deploy1002: Started deploy [analytics/refinery@740226b]: Hotfix for referrer job
* 01:44 krinkle@deploy1002: Synchronized wmf-config/mc.php: {{Gerrit|I5869b3c3ba4a}} (duration: 01m 08s)
* 01:23 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 01:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:19 ryankemper: [[phab:T280382|T280382]] Aborted data transfer; `wdqs2007` is hosed (see https://phabricator.wikimedia.org/T281437)
* 01:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 00:40 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/specials/pagers/ImageListPager.php: [[phab:T281405|T281405]] (duration: 01m 08s)
* 00:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 00:06 ryankemper: [[phab:T280382|T280382]] `wdqs1013.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv  2.7T  998G  1.6T  39% /srv`


== 2021-04-28 ==
== 2021-07-30 ==
* 23:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:38 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 23:06 dpifke@deploy1002: Finished deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886 (duration: 00m 05s)
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 23:06 dpifke@deploy1002: Started deploy [performance/navtiming@cf8b2e9]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/682886
* 22:44 dwisehaupt: civiproxy revision changed to {{Gerrit|99cecb924a}} - initial rollout of code for testing
* 22:26 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 22:26 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 22:18 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 22:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 21:44 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 21:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1013.eqiad.wmnet with reason: REIMAGE
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 21:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 ryankemper: [[phab:T280382|T280382]] `wdqs2007` is reachable again; glancing at `/srv/wdqs` its `wikidata.jnl` is `839G` when it should be `975G` so I'll re-do the wikidata journal transfer
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 21:32 ryankemper: [[phab:T280382|T280382]] [WDQS] `wdqs2007` ssh is unreachable; power cycling via `racadm>>racadm serveraction powercycle`
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 21:24 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage` (previous reimage timed out, instance appears to have rebooted)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 21:07 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 21:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:04 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 21:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 21:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 20:00 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 19:57 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.1"
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 19:13 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]] (duration: 01m 07s)
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 19:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3  refs [[phab:T278347|T278347]]
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 18:21 legoktm: added mvolz as listadmin for services@ and reset admin pw ([[phab:T278516|T278516]])
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Wikibase/client/includes/DataAccess/Scribunto/WikibaseLanguageIndependentLuaBindings.php: {{Gerrit|b392dba0d77904d7de819043e51d8c3fbf003873}}: Fix incorrect ItemId typehint in Lua bindings ([[phab:T281361|T281361]]) (duration: 01m 09s)
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 16:52 papaul: powerdown logstash2034 for relocation
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 16:29 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 16:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 16:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 16:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: REIMAGE
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 16:27 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 16:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 16:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: REIMAGE
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 16:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: REIMAGE
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 16:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 16:21 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: REIMAGE
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 16:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: REIMAGE
* 11:23 moritzm: installing libsndfile security updates on stretch
* 16:19 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 16:12 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 15:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 15:25 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 15:20 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts conf[2001-2003].codfw.wmnet
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 15:12 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on sessionstore2001.codfw.wmnet with reason: Server relocation
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 15:03 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 15:00 moritzm: imported python-poolcounter 0.0.2-1+deb11u1 to apt.wikimedia.org [[phab:T275873|T275873]]
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 14:53 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts conf[2001-2003].codfw.wmnet
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 14:44 moritzm: imported gitlab-ce 13.9.7-ce.0 to apt.wikimedia.org
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 14:40 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d] (duration: 04m 59s)
* 14:35 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d] (duration: 00m 06s)
* 14:34 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d] (thin): Regular analytics weekly train THIN [analytics/refinery@559d98d]
* 14:34 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 03m 07s)
* 14:32 moritzm: installing iproute2 updates from buster point release
* 14:31 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 14:30 milimetric@deploy1002: deploy aborted: - (duration: 00m 00s)
* 14:30 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: -
* 14:30 milimetric@deploy1002: Finished deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d] (duration: 12m 31s)
* 14:26 moritzm: installing net-snmp updates from buster point release
* 14:17 milimetric@deploy1002: Started deploy [analytics/refinery@559d98d]: Regular analytics weekly train [analytics/refinery@559d98d]
* 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 13:15 jayme: restarting pybal on lvs5001,lvs4005,lvs2007 - [[phab:T271573|T271573]]
* 13:14 liw@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 3.17.0-wmf.1"
* 13:10 jayme: restarting pybal on lvs5002,lvs4006,lvs2008 - [[phab:T271573|T271573]]
* 13:04 liw@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.3 (duration: 01m 07s)
* 13:03 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 13:03 liw@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.3
* 13:02 moritzm: upgrading deployment servers to PHP 7.4.32
* 12:55 moritzm: upgrading snapshot hosts to PHP 7.4.32
* 12:48 jayme: restarting pybal on lvs2009 - [[phab:T271573|T271573]]
* 12:45 moritzm: upgrading labweb to PHP 7.4.32
* 12:43 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:42 jayme: restarting pybal on lvs5003,lvs4007 - [[phab:T271573|T271573]]
* 12:39 jayme: restarting pybal on lvs2010 - [[phab:T271573|T271573]]
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
* 12:28 apergos: manually edited /srv/deployment/dumps/dumps-cache/config on snapshots1011,12,13 to change deploy1001 to deploy1002 (where did it get the old value from? these are new installs!)
* 12:16 moritzm: rolling restart of cassandra in restbase-dev to pick up Java security updates
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 12:15 jmm@cumin2001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99)
* 12:15 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
* 11:53 jayme: switching SRV record _etcd._tcp to new etcd cluster (for codfw, eqsin, ulsfo)
* 11:22 Urbanecm: EU B&C window done
* 11:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: {{Gerrit|8d0ae5e8fedefa911fc216bfc810d7a6169ea7e5}}: Separate reference preview settings in beta & non-beta ([[phab:T281235|T281235]]) (duration: 01m 08s)
* 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ddbc378e41783356e28cd90bbefa08624ea2844c}}: Enable partial action blocks on testwiki ([[phab:T280528|T280528]]) (duration: 01m 07s)
* 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 11:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 10:44 jbond42: updated the check-raid nrpe script to python3
* 09:40 moritzm: restarting Tomcat on idp-test1001 to pick up Java security updates
* 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15618 and previous config saved to /var/cache/conftool/dbconfig/20210428-092103-root.json
* 09:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1001.wikimedia.org
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint1001.wikimedia.org
* 09:09 moritzm: restarting jenkins* on releases to pick up Java security updates
* 09:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2001.wikimedia.org
* 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15617 and previous config saved to /var/cache/conftool/dbconfig/20210428-090559-root.json
* 08:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host contint2001.wikimedia.org
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15616 and previous config saved to /var/cache/conftool/dbconfig/20210428-085056-root.json
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InterwikiSortOrders.php: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 01m 08s)
* 08:41 urbanecm@deploy1002: sync-file aborted: {{Gerrit|96ad0d4ad294c442b4936a63ae1cd9de9c098aa9}}: Add alt, bcl, diq, mad, mni, mnw, nia, skr, tay and trv to InterwikiSortOrders (duration: 00m 02s)
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15615 and previous config saved to /var/cache/conftool/dbconfig/20210428-083625-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15614 and previous config saved to /var/cache/conftool/dbconfig/20210428-083552-root.json
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P15613 and previous config saved to /var/cache/conftool/dbconfig/20210428-083458-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15612 and previous config saved to /var/cache/conftool/dbconfig/20210428-082625-root.json
* 08:25 effie: update php7.2 on jobrunners and parsoid servers && rolling  php7.2-fpm restarts
* 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15611 and previous config saved to /var/cache/conftool/dbconfig/20210428-081121-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15610 and previous config saved to /var/cache/conftool/dbconfig/20210428-075618-root.json
* 07:52 effie: update php7.2 on api servers && rolling  php7.2-fpm restarts
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15609 and previous config saved to /var/cache/conftool/dbconfig/20210428-074114-root.json
* 07:40 marostegui: Deploy schema change on db1098:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 07:27 effie: update php7.2 on appservers && rolling  php7.2-fpm restarts
* 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098 for schema change and kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15608 and previous config saved to /var/cache/conftool/dbconfig/20210428-072609-marostegui.json
* 07:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:14 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:12 elukey: add AAAA record for kafka-main200[3,4,5].codfw.wmnet
* 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:05 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:04 elukey: add AAAA record for kafka-main2002.codfw.wmnet
* 07:03 marostegui: Deploy schema change on db2089:3316 and db1098:3316 [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:26 legoktm: created mailman3 superusers for Administrator (noc@), Ladsgroup and Legoktm
* 06:23 legoktm: legoktm@lists1001:~$ sudo mailman-web set_default_site --name lists.wikimedia.org --domain lists.wikimedia.org
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15607 and previous config saved to /var/cache/conftool/dbconfig/20210428-061426-root.json
* 06:00 marostegui: Stop MySQL on db2096 (x1 codfw) [[phab:T281135|T281135]]
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15606 and previous config saved to /var/cache/conftool/dbconfig/20210428-055922-root.json
* 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1167 in s8 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15605 and previous config saved to /var/cache/conftool/dbconfig/20210428-055144-marostegui.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15604 and previous config saved to /var/cache/conftool/dbconfig/20210428-054419-root.json
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15603 and previous config saved to /var/cache/conftool/dbconfig/20210428-052915-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P15602 and previous config saved to /var/cache/conftool/dbconfig/20210428-051526-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 (old s1 master) for schema change', diff saved to https://phabricator.wikimedia.org/P15601 and previous config saved to /var/cache/conftool/dbconfig/20210428-050754-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 master and remove read-only from s1 [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15600 and previous config saved to /var/cache/conftool/dbconfig/20210428-050138-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s1 as read-only for maintenance [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15599 and previous config saved to /var/cache/conftool/dbconfig/20210428-050041-marostegui.json
* 05:00 marostegui: Starting s1 eqiad failover from db1083 to db1163 - [[phab:T278214|T278214]]
* 04:14 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 04:14 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 04:08 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 04:08 marostegui: Start replication changes, connect everything to db1163 [[phab:T278214|T278214]]
* 04:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 04:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 before the switchover [[phab:T278214|T278214]]', diff saved to https://phabricator.wikimedia.org/P15598 and previous config saved to /var/cache/conftool/dbconfig/20210428-040718-marostegui.json
* 03:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
* 03:49 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
* 03:48 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1013.eqiad.wmnet
* 03:33 ryankemper: `sudo systemctl restart wdqs-blazegraph` on `wdqs1012` to clear the `WDQS SPARQL` warning
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2007.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1013.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 02:33 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 02:28 robh@cumin1001: START - Cookbook sre.dns.netbox
* 01:06 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 01:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 00:03 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE


== 2021-04-27 ==
== 2021-07-29 ==
* 23:58 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1015.eqiad.wmnet with reason: REIMAGE
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1014.eqiad.wmnet with reason: REIMAGE
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: REIMAGE
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 23:52 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: REIMAGE
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1011.eqiad.wmnet with reason: REIMAGE
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 21:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2005-2006].codfw.wmnet
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 20:55 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2005-2006].codfw.wmnet
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 20:54 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[2003-2004].codfw.wmnet
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 20:42 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[2003-2004].codfw.wmnet
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:32 bblack: re-pooling codfw public traffic - [[phab:T279457|T279457]]
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 20:11 jhuneidi@deploy1002: Synchronized php-1.37.0-wmf.3/includes/rcfeed/IRCColourfulRCFeedFormatter.php: Backport rcfeed: Remove reference assignment ([[phab:T281226|T281226]]) to 1.37.0-wmf.3 (duration: 01m 12s)
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 20:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 20:06 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2005.codfw.wmnet with reason: REIMAGE
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 19:44 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 19:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 19:35 papaul: powerdown ms-backup2001  for maintenance
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 19:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main2004.codfw.wmnet with reason: REIMAGE
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 19:07 papaul: powerdown logstash2035  for maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 19:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 19:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 18:50 mutante: people1003 - destroying VM and recreating again from scratch to test if issue of no console and no access is repeatable
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 18:50 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 18:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 18:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: REIMAGE
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:33 mutante: people1003 - rebooting, trying to get new VM to work
* 14:11 vgutierrez: restart pybal on lvs2009
* 18:33 Urbanecm: Morning B&C window done
* 14:09 vgutierrez: restart pybal on lvs2010
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|91a85f2}}: {{Gerrit|ac770bf}}: Enable language in header for office and testwiki users ([[phab:T280526|T280526]]) (duration: 01m 19s)
* 14:07 vgutierrez: restart pybal on lvs2008
* 18:32 bblack: lvs2009 - restart pybal + re-run puppet agent - [[phab:T279457|T279457]]
* 14:05 vgutierrez: restart pybal on lvs2007
* 18:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:59 vgutierrez: restart pybal on lvs1014
* 18:20 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[56].codfw.wmnet
* 13:55 vgutierrez: restart pybal on lvs1015
* 18:20 bblack: cp203[56] - repooling in etcd - [[phab:T279457|T279457]]
* 13:52 _joe_: restarting pybal on lvs1016
* 18:19 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 18:17 robh@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 18:17 robh@cumin1001: START - Cookbook sre.dns.netbox
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 18:16 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 18:12 robh@cumin1001: START - Cookbook sre.dns.netbox
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 18:11 bblack: dns2001 - restarting bird to repool, then re-enabling puppet - [[phab:T279457|T279457]]
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 18:04 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 18:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 18:02 ejegg: update payments-wiki from {{Gerrit|9a4eef1375}} to {{Gerrit|44570561f2}}
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 17:58 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1004.eqiad.wmnet with reason: REIMAGE
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 17:34 papaul: powerdown moss-fe2001  for maintenance
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 17:32 robh@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:25 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 17:23 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 17:21 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 17:19 ryankemper: [[phab:T281215|T281215]] Banned `elastic2043` from codfw cirrussearch cluster
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 17:16 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 17:14 papaul: powerdown kafka-logging2003  for maintenance
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:14 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 17:10 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 17:09 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 16:52 papaul: powerdown elastic2045  for maintenance
* 16:49 papaul: powerdown ms-be2042 for maintenance
* 16:39 dcaro: reprepro updating packages on thirdparty/ceph-nautilus-buster
* 16:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:29 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 39 hosts with reason: upgrading openstack
* 16:22 effie: upgrading scap 3.17.1-1 on mediawiki canaries - [[phab:T279695|T279695]]
* 16:18 effie: uploading scap_3.17.1-1
* 16:18 effie: uploading cap_3.17.1-1
* 15:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1026.eqiad.wmnet
* 14:48 moritzm: installing file/libmagic updates from buster point release
* 14:47 bblack: lvs2009 - disable puppet + stop pybal (internal services will move to lvs2010, please avoid LVS service definition changes for now!) - [[phab:T279457|T279457]]
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2003.codfw.wmnet
* 14:36 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[56].codfw.wmnet
* 14:36 bblack: cp203[56] - depool all etcd services via confctl - [[phab:T279457|T279457]]
* 14:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2003.codfw.wmnet
* 14:33 bblack: dns2001 - depooling for [[phab:T279457|T279457]] (disable puppet + stop bird)
* 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2002.codfw.wmnet
* 14:31 moritzm: installing imagemagick security updates
* 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2002.codfw.wmnet
* 14:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
* 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
* 14:20 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 14:19 moritzm: installing xen security updates
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:17 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:16 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
* 14:15 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'staging' .
* 14:14 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
* 14:09 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:08 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
* 14:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 105 hosts with reason: upgrading openstack
* 14:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 14:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 9 hosts with reason: upgrading                  openstack
* 13:58 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
* 13:56 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
* 13:55 moritzm: imported jenkins 2.277.3 to thirdparty/ci
* 13:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
* 13:48 moritzm: uploaded openjdk-8 8u292-b10-0~deb10u1 (buster forward port of latest Java 8 security release)
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:46 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris: switchover api-gateway, changeprop, cpjobqueue to use the new redis cluster servers (rdb2007-rdb2010)
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:45 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 13:44 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:34 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 13:33 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:30 hashar: Upgrading CI Jenkins from 2.263.3 to 2.277.2
* 13:23 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1020-1026].eqiad.wmnet
* 13:19 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 13:13 liw@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.3
* 13:08 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/includes/Config/WikiPageConfigValidation.php: {{Gerrit|fe2a0420fd884df7046c0c283bcb2e961e74e8e9}}: WikiPageConfigValidation: Mentor lists and help desk can be null ([[phab:T281229|T281229]]) (duration: 01m 06s)
* 13:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:07 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 13:06 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1020-1026].eqiad.wmnet
* 13:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be1019.eqiad.wmnet
* 12:55 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be1019.eqiad.wmnet
* 12:46 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:682815{{!}}Revert "URGENT: Disable GlobalUsage" (T281242)]] (duration: 01m 08s)
* 12:44 hashar: Restarted CI Jenkins for plugins upgrade
* 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15592 and previous config saved to /var/cache/conftool/dbconfig/20210427-122619-root.json
* 12:20 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GlobalUsage: Backport: [[gerrit:682814{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:12 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GlobalUsage: Backport: [[gerrit:682813{{!}}Avoid reading primary unless absolutely necessary (T281238)]] (duration: 01m 09s)
* 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15591 and previous config saved to /var/cache/conftool/dbconfig/20210427-121115-root.json
* 12:00 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 12:00 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on labstore1007.wikimedia.org with reason: [[phab:T281045|T281045]]
* 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15590 and previous config saved to /var/cache/conftool/dbconfig/20210427-115612-root.json
* 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P15589 and previous config saved to /var/cache/conftool/dbconfig/20210427-114108-root.json
* 11:36 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
* 11:30 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
* 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Remove RW from commonswiki', diff saved to https://phabricator.wikimedia.org/P15588 and previous config saved to /var/cache/conftool/dbconfig/20210427-111016-marostegui.json
* 11:09 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Disable GlobalUsage (duration: 01m 08s)
* 10:40 volans@cumin1001: dbctl commit (dc=all): 'S4 RO, outage', diff saved to https://phabricator.wikimedia.org/P15585 and previous config saved to /var/cache/conftool/dbconfig/20210427-104057-volans.json
* 10:18 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 10:06 XioNoX: standardize management routers ACLs with Capirca - mr1-eqiad (last one)
* 10:01 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 02m 16s)
* 09:59 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:56 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 22s)
* 09:56 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 for schema change', diff saved to https://phabricator.wikimedia.org/P15584 and previous config saved to /var/cache/conftool/dbconfig/20210427-093536-marostegui.json
* 09:35 XioNoX: standardize management routers ACLs with Capirca - mr1-eqsin
* 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15583 and previous config saved to /var/cache/conftool/dbconfig/20210427-093501-root.json
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
* 09:34 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
* 09:33 moritzm: rolling restart of elastic in relforge* to pick up Java updates
* 09:32 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
* 09:31 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15582 and previous config saved to /var/cache/conftool/dbconfig/20210427-091957-root.json
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
* 09:19 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
* 09:17 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
* 09:16 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
* 09:16 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 09:11 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:11 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:07 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:06 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2010.codfw.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:05 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1012.eqiad.wmnet with reason: REIMAGE
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15581 and previous config saved to /var/cache/conftool/dbconfig/20210427-090454-root.json
* 09:04 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2009.codfw.wmnet with reason: REIMAGE
* 09:04 jayme@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
* 09:03 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1011.eqiad.wmnet with reason: REIMAGE
* 09:01 legoktm@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P15580 and previous config saved to /var/cache/conftool/dbconfig/20210427-084950-root.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for schema change', diff saved to https://phabricator.wikimedia.org/P15579 and previous config saved to /var/cache/conftool/dbconfig/20210427-084651-marostegui.json
* 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15578 and previous config saved to /var/cache/conftool/dbconfig/20210427-084630-root.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114 into main and api', diff saved to https://phabricator.wikimedia.org/P15577 and previous config saved to /var/cache/conftool/dbconfig/20210427-083910-marostegui.json
* 08:36 XioNoX: standardize management routers ACLs with Capirca
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15576 and previous config saved to /var/cache/conftool/dbconfig/20210427-083145-marostegui.json
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15575 and previous config saved to /var/cache/conftool/dbconfig/20210427-083126-root.json
* 08:24 hashar: Restarting CI Jenkins for plugins upgrade
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114 into main and traffic', diff saved to https://phabricator.wikimedia.org/P15574 and previous config saved to /var/cache/conftool/dbconfig/20210427-081911-marostegui.json
* 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15573 and previous config saved to /var/cache/conftool/dbconfig/20210427-081846-root.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15572 and previous config saved to /var/cache/conftool/dbconfig/20210427-081623-root.json
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 100%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15571 and previous config saved to /var/cache/conftool/dbconfig/20210427-081325-root.json
* 08:12 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:11 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:10 legoktm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:10 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2008.codfw.wmnet with reason: REIMAGE
* 08:08 legoktm@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb2007.codfw.wmnet with reason: REIMAGE
* 08:03 jayme@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
* 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 90%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15570 and previous config saved to /var/cache/conftool/dbconfig/20210427-080342-root.json
* 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P15569 and previous config saved to /var/cache/conftool/dbconfig/20210427-080119-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 75%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15568 and previous config saved to /var/cache/conftool/dbconfig/20210427-075822-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for schema change', diff saved to https://phabricator.wikimedia.org/P15567 and previous config saved to /var/cache/conftool/dbconfig/20210427-075759-marostegui.json
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15566 and previous config saved to /var/cache/conftool/dbconfig/20210427-075738-root.json
* 07:52 liw@deploy1002: Pruned MediaWiki: 1.36.0-wmf.38 (duration: 03m 17s)
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 80%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15565 and previous config saved to /var/cache/conftool/dbconfig/20210427-074839-root.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 50%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15564 and previous config saved to /var/cache/conftool/dbconfig/20210427-074318-root.json
* 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15563 and previous config saved to /var/cache/conftool/dbconfig/20210427-074234-root.json
* 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15562 and previous config saved to /var/cache/conftool/dbconfig/20210427-073335-root.json
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1087 (re)pooling @ 25%: Repool db1087', diff saved to https://phabricator.wikimedia.org/P15561 and previous config saved to /var/cache/conftool/dbconfig/20210427-072814-root.json
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15560 and previous config saved to /var/cache/conftool/dbconfig/20210427-072731-root.json
* 07:26 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 liw@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.3 (duration: 30m 54s)
* 07:21 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:21 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf[2004-2006].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf[2002-2003].codfw.wmnet with reason: for zookeeper migration
* 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 60%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15559 and previous config saved to /var/cache/conftool/dbconfig/20210427-071831-root.json
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P15558 and previous config saved to /var/cache/conftool/dbconfig/20210427-071227-root.json
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15557 and previous config saved to /var/cache/conftool/dbconfig/20210427-070328-root.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for schema change', diff saved to https://phabricator.wikimedia.org/P15556 and previous config saved to /var/cache/conftool/dbconfig/20210427-065628-marostegui.json
* 06:55 elukey: upgrade mariadb to 10.4.18-1 + reboot on db1108 - [[phab:T279281|T279281]]
* 06:54 liw@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.3
* 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 40%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15555 and previous config saved to /var/cache/conftool/dbconfig/20210427-064824-root.json
* 06:37 liw: version 1.37.0-wmf.3 was branched at {{Gerrit|20ab303fd1d883592b4d2ec2468dfaccad7a9e10}} for [[phab:T278347|T278347]]
* 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 30%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15554 and previous config saved to /var/cache/conftool/dbconfig/20210427-063320-root.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15553 and previous config saved to /var/cache/conftool/dbconfig/20210427-061817-root.json
* 06:11 elukey: powercycle elastic2043 - no ssh, no tty remote console available
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 20%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15552 and previous config saved to /var/cache/conftool/dbconfig/20210427-060313-root.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 15%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15551 and previous config saved to /var/cache/conftool/dbconfig/20210427-054809-root.json
* 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 10%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15550 and previous config saved to /var/cache/conftool/dbconfig/20210427-053306-root.json
* 05:30 XioNoX: push pfw fw policies - [[phab:T281137|T281137]]
* 05:27 legoktm: imported hyperkitty_1.3.4-2~bpo10+2 to apt.wm.o ([[phab:T281213|T281213]])
* 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15549 and previous config saved to /var/cache/conftool/dbconfig/20210427-052236-root.json
* 05:21 marostegui: Stop mysql on db1087 to clone db1167 (lag will appear on wikidata on wikireplicas) [[phab:T258361|T258361]]
* 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1114 temporarily as db1087 will be depooled', diff saved to https://phabricator.wikimedia.org/P15547 and previous config saved to /var/cache/conftool/dbconfig/20210427-052026-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 5%: Slowly pool into s7 db1124', diff saved to https://phabricator.wikimedia.org/P15546 and previous config saved to /var/cache/conftool/dbconfig/20210427-051802-root.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15545 and previous config saved to /var/cache/conftool/dbconfig/20210427-050826-marostegui.json
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15544 and previous config saved to /var/cache/conftool/dbconfig/20210427-050732-root.json
* 05:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1077.eqiad.wmnet
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1077.eqiad.wmnet
* 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15543 and previous config saved to /var/cache/conftool/dbconfig/20210427-045229-root.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 with minimal weight for the first time in s7 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15541 and previous config saved to /var/cache/conftool/dbconfig/20210427-044609-marostegui.json
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1124 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15540 and previous config saved to /var/cache/conftool/dbconfig/20210427-044520-marostegui.json
* 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15539 and previous config saved to /var/cache/conftool/dbconfig/20210427-043725-root.json
* 04:25 legoktm: upgrading lists-next.wikimedia.org to mailman3-from-bullseye ([[phab:T280887|T280887]])
* 04:19 marostegui: Set phabricator on read only [[phab:T279625|T279625]]
* 03:37 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 03:37 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 03:36 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@08ad17a]: 0.3.70 (duration: 08m 18s)
* 03:28 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.70` on canary `wdqs1003`; proceeding to rest of fleet
* 03:28 ryankemper@deploy1002: Started deploy [wdqs/wdqs@08ad17a]: 0.3.70
* 03:27 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.70`. Pre-deploy tests passing on canary `wdqs1003`
* 03:17 ryankemper: [[phab:T280382|T280382]] `wdqs1006` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to raid0: `/dev/md2        2.6T  998G  1.5T  40% /srv`
* 02:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:29 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph --task-id [[phab:T280382|T280382]]` on `ryankemper@cumin1001` tmux session `reimage`
* 01:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 01:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 01:21 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 01:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer


== 2021-04-26 ==
== 2021-07-28 ==
* 23:28 mutante: renewing TLS cert for peopleweb.discovery.wmnet, adding *3 hosts
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 23:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people1003.eqiad.wmnet with reason: new host
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 22:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:24 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: REIMAGE
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 22:11 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] --new wdqs1006.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1003.eqiad.wmnet
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 20:48 twentyafterfour: restarting php-fpm on phab1001 to deploy phabricator hotfix {{Gerrit|d238db85b8d8072d99f31805aa4a8a7cf0c09941}}
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 20:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1003.eqiad.wmnet
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 20:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts planet1003.eqiad.wmnet
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts planet1003.eqiad.wmnet
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 19:45 legoktm: uploaded python3-falcon, python3-mimeparse, python3-mujson, openstack-pkg-tools to mailman3 component on apt.wm.o
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 18:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 18:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 18:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: REIMAGE
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 18:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 18:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1002.eqiad.wmnet with reason: REIMAGE
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 18:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1001.eqiad.wmnet with reason: REIMAGE
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2d16f6251a67cf13cef02bbdcb3c9f5c1c505d16}}: elwiki: Update Growth experiments configuration ([[phab:T280172|T280172]]) (duration: 00m 58s)
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 18:06 urbanecm@deploy1002: Synchronized multiversion/MWScript.php: {{Gerrit|5ace4e1b806bcfc4ea059f9e9cae9aa94c0bdbd1}}: Fix error message if MWScript.php is run without arguments (duration: 00m 58s)
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 17:28 dduvall@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 17:26 dduvall@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:18 dduvall@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:06 legoktm: imported postorius_1.3.4-2~bpo10+2 to apt.wm.o
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 16:49 mutante: gerrit - restarted apache (hard) to remove time out from gerrit:682502
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 16:40 mutante: gerrit1001 - reload apache2
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 16:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1025.eqiad.wmnet
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1025.eqiad.wmnet
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:26 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 15:24 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:21 elukey: restart zookeeper on conf2004 to pick up the -javaagent setting for the prometheus exporter
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 15:06 moritzm: installing jquery security updates on stretch
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:01 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:54 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:48 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:47 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 14:28 moritzm: installing ldap-replica1003/1004
* 13:29 moritzm: installing python2.7 security updates on stretch
* 14:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 13:08 moritzm: installing python3.5 security updates on stretch
* 14:03 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on conf2001.codfw.wmnet with reason: for zookeeper migration
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15537 and previous config saved to /var/cache/conftool/dbconfig/20210426-133922-root.json
* 11:27 moritzm: installing nginx security updates on thumbor*
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15536 and previous config saved to /var/cache/conftool/dbconfig/20210426-133905-root.json
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 13:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:27 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: for zookeeper migration
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15535 and previous config saved to /var/cache/conftool/dbconfig/20210426-132533-root.json
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15534 and previous config saved to /var/cache/conftool/dbconfig/20210426-132417-root.json
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15533 and previous config saved to /var/cache/conftool/dbconfig/20210426-132402-root.json
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:14 moritzm: installing ldap-replica2005/2006
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15532 and previous config saved to /var/cache/conftool/dbconfig/20210426-131029-root.json
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15531 and previous config saved to /var/cache/conftool/dbconfig/20210426-130913-root.json
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15530 and previous config saved to /var/cache/conftool/dbconfig/20210426-130858-root.json
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:57 moritzm: installing gst-plugins-base1.0 security updates
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15529 and previous config saved to /var/cache/conftool/dbconfig/20210426-125526-root.json
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P15528 and previous config saved to /var/cache/conftool/dbconfig/20210426-125409-root.json
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15527 and previous config saved to /var/cache/conftool/dbconfig/20210426-125354-root.json
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15526 and previous config saved to /var/cache/conftool/dbconfig/20210426-124141-marostegui.json
* 08:27 Amir1: running several long-running queries against pc1007
* 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P15525 and previous config saved to /var/cache/conftool/dbconfig/20210426-124022-root.json
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P15524 and previous config saved to /var/cache/conftool/dbconfig/20210426-123020-marostegui.json
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:28 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 07:53 moritzm: installing aspell security updates on stretch
* 12:27 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 12:24 Amir1: cleaning watchlist of QuickStatementsBot in wikidatawiki
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 12:06 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 12:05 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,service=nginx,name=mw1338.eqiad.wmnet
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 12:00 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Enable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 11:57 marostegui: Restart es4 primary master - [[phab:T279281|T279281]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 11:55 marostegui@deploy1002: Synchronized wmf-config/db-eqiad.php: Disable writes on es4 [[phab:T279281|T279281]] (duration: 00m 56s)
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 11:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 11:49 hashar@deploy1002: Finished deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki (duration: 00m 13s)
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 11:49 hashar@deploy1002: Started deploy [integration/docroot@c2e48c9]: doc: Explain that VE is both stand-alone and integrated into MediaWiki
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php
* 11:46 Urbanecm: EU B&C done
* 11:45 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/TemplateData/modules/ext.templateDataGenerator.editTemplatePage/Dialog.js: {{Gerrit|a347517f906b07b2503ae559c6cc714e1c50e4aa}}: Fix suggested values not being shown when the params type isnt specified ([[phab:T280688|T280688]]) (duration: 00m 57s)
* 11:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681137{{!}}Revert "Set wgPageImagesAPIDefaultLicense to 'any' for wikidata"]] (duration: 00m 57s)
* 11:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
* 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2b5b640ad28bce1df20c2ca82654996d9cfc7630}}: Enable ContentTranslation as a default tool for 11 Wikipedias ([[phab:T279422|T279422]]) (duration: 00m 57s)
* 10:58 effie: restarting php-fpm in mw* clusters in codfw to pick up php7.2 update
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:682575{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 10:38 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1004.wikimedia.org
* 10:37 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Setup wmgUseFooterCodeOfConductLink for later usage (duration: 00m 57s)
* 10:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
* 10:26 effie: upgrading mw* servers  php7.2  in codfw
* 10:25 marostegui: Deploy schema change on s4 codfw, lag will appear [[phab:T276292|T276292]]
* 10:24 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use wmgUseFooterTechCodeOfConductLink instead of wmgUseFooterCodeOfConductLink (duration: 00m 57s)
* 10:24 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1004.wikimedia.org
* 10:22 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add wmgUseFooterTechCodeOfConductLink (duration: 00m 59s)
* 10:22 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1003.wikimedia.org
* 10:18 moritzm: installing systemd updates from buster 10.9 point release
* 10:07 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica1003.wikimedia.org
* 10:00 filippo@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2006.wikimedia.org
* 09:42 moritzm: installing clamav security updates on otrs1001
* 09:38 godog: reboot ms-be1062, kernel backtrace saved
* 09:26 filippo@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=eqiad
* 09:26 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2006.wikimedia.org
* 09:24 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2005.wikimedia.org
* 09:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 09:15 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 09:13 jayme: imported etcd-mirror_0.0.6-2 to buster-wikimedia
* 09:10 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005.wikimedia.org
* 09:07 jmm@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2005failoid1002.wikimedia.org
* 09:04 jayme: imported etcd-mirror_0.0.6-1 to buster-wikimedia
* 08:55 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host ldap-replica2005failoid1002.wikimedia.org
* 08:49 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|f01a6dab70f74938dd51668809a181a8f551b6c8}}: GrowthExperiments: Enable community configuration on testwiki ([[phab:T274520|T274520]]) (duration: 00m 57s)
* 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: NOOP: {{Gerrit|88da8226823e59d1d19db9aeca3b5a5140c0c60c}}: GrowthExperiments: Do not enable community configuration outside of beta wikis ([[phab:T274520|T274520]]) (duration: 00m 59s)
* 08:28 moritzm: update debmonitor to 0.2.9 on remaining hosts [[phab:T281090|T281090]]
* 08:13 moritzm: installing lxml security updates on stretch
* 07:54 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 07:54 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on conf2005.codfw.wmnet with reason: for initial etcd replication
* 07:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 07:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 07:32 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 07:24 moritzm: installing pear security updates
* 07:09 moritzm: removed rawdog from bullseye-wikimedia, needs Py2 [[phab:T280989|T280989]]
* 06:24 elukey: reboot an-coord1001 to pick up kernel security settings (after reimage)
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1158 to dbctl, depooled, [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15521 and previous config saved to /var/cache/conftool/dbconfig/20210426-054700-marostegui.json
* 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: REIMAGE
* 03:43 kart_: Updated cxserver to 2021-04-21-044024-production ([[phab:T279045|T279045]])
* 03:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:37 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 03:32 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .


== 2021-04-25 ==
== 2021-07-27 ==
* 15:23 Amir1: sudo -u list /var/lib/mailman/bin/change_pw -l wikica-l -p $(pwgen -c1 -s 12) ([[phab:T281066|T281066]])
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2021-04-24 ==
== 2021-07-26 ==
* 22:24 bstorm: Rebooting labstore1007 from ilo after crash
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 06:39 moritzm: installing krb5 security updates
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki


== 2021-04-23 ==
== 2021-07-24 ==
* 21:36 foks: removing 1 file for legal compliance
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php  --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 20:15 mutante: [apt1001:~] $ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/dzahn/rawdog_2.23-2_all.deb ([[phab:T280989|T280989]])
* 19:41 mutante: [apt1001:~] $ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy - copy envoy package from buster to bullseye [[phab:T280989|T280989]]
* 19:09 ebernhardson: closing duplicate/wrong cluster indices in cloudelastic
* 17:02 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet
* 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:32 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:59 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:59 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 14:25 moritzm: revert back bullseye image to daily build from last week (to rule out potential reimage issue)
* 13:33 elukey: roll restart of all thanos-swift proxies to pick up new ML account - [[phab:T280773|T280773]]
* 12:50 jbond42: upload new debmonitor-client packages
* 11:50 moritzm: installing perf updates from Buster 10.9 point release
* 10:06 moritzm: installing Linux 4.19.181 updates from Buster 10.9 point release (no reboots, just updating the packages)
* 09:54 moritzm: installing xen security updates
* 09:49 moritzm: installing xorg-server security updates
* 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15512 and previous config saved to /var/cache/conftool/dbconfig/20210423-093723-root.json
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15511 and previous config saved to /var/cache/conftool/dbconfig/20210423-092220-root.json
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15510 and previous config saved to /var/cache/conftool/dbconfig/20210423-090716-root.json
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15509 and previous config saved to /var/cache/conftool/dbconfig/20210423-085212-root.json
* 08:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1020.eqiad.wmnet
* 08:21 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1020.eqiad.wmnet
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1021.eqiad.wmnet
* 08:13 moritzm: upgrading d-i image for bullseye to RC1 release [[phab:T275873|T275873]]
* 08:12 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1021.eqiad.wmnet
* 08:12 moritzm: upgrading d-i image for bullseye to RC1 release
* 08:12 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be1019.eqiad.wmnet
* 07:59 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1019.eqiad.wmnet
* 07:56 jynus: deleting db1156 s2 database and reloading it from logical backups [[phab:T280492|T280492]]
* 07:22 Amir1: removing junk bounced email addresses from yahoo from all mailing lists
* 05:40 marostegui: Stop db1079 to clone db1158 (lag will appear on s7 on wiki replicas)
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 to clone db1158 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15506 and previous config saved to /var/cache/conftool/dbconfig/20210423-053907-marostegui.json


== 2021-04-22 ==
== 2021-07-23 ==
* 17:26 marostegui: Stop mysql on tendril/dbtree database
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 16:33 volker-e@deploy1002: Finished deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455) (duration: 00m 06s)
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 16:32 volker-e@deploy1002: Started deploy [design/style-guide@e914e8a]: Deploy design/style-guide: {{Gerrit|e914e8a}} icons: Add 'share' icon (#455)
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:23 marostegui: Tendril and dbtree are up but on a degraded status (slow reponse)
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 13:19 marostegui: Tendril and dbtree are down at the moment
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 12:46 Urbanecm: Start server-side upload for 2 video files ([[phab:T280763|T280763]], [[phab:T280524|T280524]])
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 12:31 marostegui: Restart mysql on db1115 (tendril/dbtree will fail)
* 16:15 effie: enable puppet on mc-gp* hosts
* 04:55 eileen: civicrm revision changed from {{Gerrit|42ca3cf65a}} to {{Gerrit|33a63d5789}}, config revision is {{Gerrit|cf07e7ba0b}}
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 02:47 krinkle@deploy1002: Finished deploy [integration/docroot@010e445]: (no justification provided) (duration: 00m 09s)
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 02:47 krinkle@deploy1002: Started deploy [integration/docroot@010e445]: (no justification provided)
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 01:34 eileen: civicrm revision changed from {{Gerrit|35a8dd33ba}} to {{Gerrit|42ca3cf65a}}, config revision is {{Gerrit|cf07e7ba0b}}
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 00:28 legoktm: legoktm@deneb:/var/cache/pbuilder/aptcache$ sudo rm -rf * # Cleaned up 8GB more
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 00:27 legoktm: legoktm@deneb:/var/cache/apt/archives$ sudo rm -rf * # cleaned up 6GB
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 00:03 legoktm: subscribed all list admins to the listadmins@ mailing list ([[phab:T280716|T280716]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]


== 2021-04-21 ==
== 2021-07-22 ==
* 23:58 eileen: tools revision changed from {{Gerrit|3d950fffbd}} to {{Gerrit|c26a8c0cb6}}
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 23:49 legoktm: made myself and Amir1 list admins for the listadmins@lists.wikimedia.org mailing list
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 20:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1017.eqiad.wmnet
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1017.eqiad.wmnet
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 20:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1016.eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 20:03 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.eqiad.wmnet
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:59 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1003.eqiad.wmnet
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:52 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:48 robh@cumin1001: START - Cookbook sre.dns.netbox
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 19:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 19:46 mutante: creating a ganeti VM to test bullseye install
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 19:46 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host planet1003.eqiad.wmnet
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 19:45 bstorm: manually kicking off a run of update-openstack-mirror on sodium to capture an upstream package update
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 19:15 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:46 Urbanecm: Morning B&C done
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikibaseMediaInfo/: {{Gerrit|f831d16e42e712832d683233a5b21ad59f7c73b3}}: Make the logistic regression image search default ([[phab:T271799|T271799]]) (duration: 00m 58s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f6d076a69607172475a86ba935a273e7519108d1}}: Update $wgGEHomepageNewAccountVariants ([[phab:T278123|T278123]]) (duration: 00m 58s)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1ae5ca5467fad7bfdae8aa94b241fe6c048ab8e5}}: Set wgGEMentorshipMigrationStage to WRITE_BOTH/READ_NEW everywhere ([[phab:T279853|T279853]]) (duration: 00m 59s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e252de0482c60e87e06d866006bb9ceb186af6cf}}: eswiki: Push Growth features out of dark mode ([[phab:T278235|T278235]]) (duration: 01m 00s)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 17:43 jynus: deploy grant changes on m5 backup sources (db1117 and db2078) [[phab:T278614|T278614]]
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:54 legoktm: [[phab:T280744|T280744]]: legoktm@lists1001:~$ sudo chmod 644 /etc/aliases
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 15:15 Urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php # [[phab:T279853|T279853]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15503 and previous config saved to /var/cache/conftool/dbconfig/20210421-151526-root.json
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:02 moritzm: installing jquery security updates on buster
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15502 and previous config saved to /var/cache/conftool/dbconfig/20210421-150023-root.json
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15501 and previous config saved to /var/cache/conftool/dbconfig/20210421-144519-root.json
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15500 and previous config saved to /var/cache/conftool/dbconfig/20210421-143015-root.json
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 14:25 jbond42: upload new version of debmonitor-client to apt
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 13:54 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=fawiki # [[phab:T279853|T279853]]
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 13:39 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 13:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ time mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiki # [[phab:T279853|T279853]]
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 13:01 moritzm: upgrading mw1262-1265,mw1277-1279 to PHP 7.2.34
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 12:21 moritzm: installing failoid2002
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: REIMAGE
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 11:49 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 11:46 jbond@cumin1001: START - Cookbook sre.dns.netbox
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 11:32 awight: EU backport window complete
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 11:31 moritzm: installing failoid1002
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 11:29 awight@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/WikimediaEvents: Backport: [[gerrit:681334{{!}}Send 0 edits userEditCountBucket for anons (T210106)]] (duration: 00m 59s)
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 10:41 jbond42: switch debmonitor-client to cfssl (second try)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 10:37 jbond42: upload golang-cfssl packages for jessi and stretch
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 10:33 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1002.eqiad.wmnet
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 10:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1002.eqiad.wmnet
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 10:23 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid1002.eqiad.wmnet
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 10:22 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1002.eqiad.wmnet
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 10:21 jmm@cumin2001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid2002.codfw.wmnet
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 10:21 hnowlan: rebooting eventlog1002 for kernel update
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 10:06 jmm@cumin2001: START - Cookbook sre.ganeti.makevm for new host failoid2002.codfw.wmnet
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 09:56 jbond42: switch debmonitor-clients to use cfssl
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15496 and previous config saved to /var/cache/conftool/dbconfig/20210421-093109-root.json
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15495 and previous config saved to /var/cache/conftool/dbconfig/20210421-091605-root.json
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 09:08 elukey: upgrade hue on an-tool1009 to 4.9
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 09:05 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 09:05 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 09:03 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2280.codfw.wmnet,service=nginx
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15494 and previous config saved to /var/cache/conftool/dbconfig/20210421-090100-root.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 08:58 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 08:58 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 08:56 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 08:55 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 08:55 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 08:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 08:53 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 05s)
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 08:52 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 08:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 08:50 filippo@deploy1002: Finished deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]] (duration: 00m 10s)
* 14:27 moritzm: installing libwebp security updates on stretch
* 08:50 filippo@deploy1002: Started deploy [librenms/librenms@692b5d5]: Upgrade LibreNMS to 21.4.0 - [[phab:T266987|T266987]]
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:47 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15493 and previous config saved to /var/cache/conftool/dbconfig/20210421-084555-root.json
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 08:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 08:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 08:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
* 11:36 Lucas_WMDE: EU backport+config window done
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 08:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 08:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 08:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 07:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 07:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 07:53 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 07:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 07:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:50 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-coord1001.eqiad.wmnet with reason: REIMAGE
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 07:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:44 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:39 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 06:49 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 06:42 elukey: upload hue_4.9.0-2+deb10u1 to buster-wikimedia
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 06:11 marostegui: Stop MySQL on db1074 to clone db1156 (there will be lag in s2 in wiki replicas) [[phab:T258361|T258361]]
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 to clone db1156 [[phab:T258361|T258361]]', diff saved to https://phabricator.wikimedia.org/P15491 and previous config saved to /var/cache/conftool/dbconfig/20210421-061019-marostegui.json
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 06:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: REIMAGE
* 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: REIMAGE
* 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1086.eqiad.wmnet
* 05:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1086.eqiad.wmnet
* 00:38 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 00:36 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
* 00:15 ryankemper: [WDQS] Pooled `wdqs1003`
* 00:14 ryankemper: [WDQS] Pooled `wdqs2008`
* 00:07 ryankemper: `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1006.eqiad.wmnet`
* 00:04 ryankemper: [WDQS] pooled `wdqs1004`


== 2021-04-20 ==
== 2021-07-21 ==
* 23:46 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 23:44 urbanecm@deploy1002: Synchronized wmf-config/config/urwiki.yaml: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 57s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|73544ccb40d9687b54c039aceb05cd033901d86f}}: urwiki: Enable Growth team features in stealth mode ([[phab:T280067|T280067]]) (duration: 00m 58s)
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 23:38 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=urwiki GrowthExperiments # [[phab:T280067|T280067]]
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:38 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 3/3) (duration: 00m 56s)
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 23:36 urbanecm@deploy1002: Synchronized wmf-config/config/elwiki.yaml: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 2/3) (duration: 00m 57s)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 23:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|314367bca6e924136704911b55fd3e2c929fa704}}: elwiki: Enable Growth team features in stealth mode ([[phab:T280172|T280172]]; 1/3) (duration: 00m 57s)
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php --wiki=hrwiki --delete
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 23:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=elwiki GrowthExperiments # [[phab:T280172|T280172]]
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 23:31 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 3/3) (duration: 00m 57s)
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:28 Urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist growthexperiments sql.php --cluster=extension1 /srv/mediawiki/php-1.37.0-wmf.1/extensions/GrowthExperiments/maintenance/schemas/mysql/growthexperiments_mentee_data.sql # [[phab:T279587|T279587]]
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 23:28 urbanecm@deploy1002: Synchronized wmf-config/config/cawiki.yaml: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 2/3) (duration: 00m 57s)
* 20:27 dancy: testing upcoming Scap release on beta
* 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|425d77b73f48b3e16a5aa2c0086f292d370cd17e}}: cawiki: Enable Growth team features in stealth mode ([[phab:T280673|T280673]]; 1/3) (duration: 00m 57s)
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 23:24 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=cawiki GrowthExperiments # [[phab:T280673|T280673]]
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 23:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 23:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on furud.codfw.wmnet with reason: REIMAGE
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 23:05 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 23:03 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flerovium.eqiad.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 22:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 22:10 robh@cumin1001: START - Cookbook sre.dns.netbox
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 20:52 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ruwiki # [[phab:T279853|T279853]]
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 20:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1020.wikimedia.org
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 20:41 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=viwiki # [[phab:T279853|T279853]]
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 20:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1020.wikimedia.org
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 20:36 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=ukwiki # [[phab:T279853|T279853]]
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1017-1019].wikimedia.org
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 20:34 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=tewiki # [[phab:T279853|T279853]]
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 20:32 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=svwiki # [[phab:T279853|T279853]]
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 20:30 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=srwiki # [[phab:T279853|T279853]]
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 20:29 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=rowiki # [[phab:T279853|T279853]]
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 20:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hywiki # [[phab:T279853|T279853]]
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 20:22 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=huwiki # [[phab:T279853|T279853]]
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 20:21 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hrwiki # [[phab:T279853|T279853]]
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 20:18 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=hewiki # [[phab:T279853|T279853]]
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 20:16 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=frwiktionary # [[phab:T279853|T279853]]
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 20:16 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1017-1019].wikimedia.org
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 20:15 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=euwiki # [[phab:T279853|T279853]]
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 20:13 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=bnwiki # [[phab:T279853|T279853]]
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 20:08 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 19:58 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 19:56 robh@cumin1001: START - Cookbook sre.dns.netbox
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 19:28 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1016.wikimedia.org
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 18:34 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=idwiki # [[phab:T279853|T279853]]
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 18:33 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1016.wikimedia.org
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/: {{Gerrit|4d1969d}}: {{Gerrit|1fbb8e9}}: MentorStore: Set wasPosted to true in command line mode ([[phab:T275773|T275773]]) (duration: 00m 59s)
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 17:26 XioNoX: boot cr1-codfw:fpc1 - [[phab:T277341|T277341]]
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 17:16 papaul: Adding a MPC7E to cr1-codfw
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 16:32 arturo: merging change to core route firewall https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681316 ([[phab:T272587|T272587]])
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 16:15 andrewbogott: updating core routers config with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/681315
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 15:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host eventlog1003.eqiad.wmnet
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 15:22 urbanecm@deploy1002: Synchronized docroot/noc/conf/debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 58s)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 15:20 urbanecm@deploy1002: Synchronized debug.json: {{Gerrit|dc6647b9c674429c0811116e0caca7639b766e77}}: remove mwdebug1003 from list of debug servers ([[phab:T267248|T267248]]) (duration: 00m 57s)
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 15:14 hnowlan@cumin1001: START - Cookbook sre.ganeti.makevm for new host eventlog1003.eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 15:08 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 15:02 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 14:59 volker-e@deploy1002: Finished deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460) (duration: 00m 07s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 14:58 volker-e@deploy1002: Started deploy [design/style-guide@c4d8314]: Deploy design/style-guide: {{Gerrit|c4d8314}} “Components”: Fix “Buttons” active states (#460)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 14:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:50 moritzm: installing systemd security updates on bullseye
* 14:38 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 14:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 10:14 effie: enable puppet on mw* servers
* 14:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 14:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 14:31 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 14:30 moritzm: installing exim updates from Buster point release
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 14:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 14:25 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a] (duration: 04m 56s)
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:25 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 14:24 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 14:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 14:20 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc6767a]
* 08:17 effie: enable puppet on alert*
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 14:18 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 14:17 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a] (duration: 00m 07s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 14:17 otto@deploy1002: Started deploy [analytics/refinery@fc6767a] (thin): Regular analytics weekly train THIN [analytics/refinery@fc6767a]
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a] (duration: 00m 03s)
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 14:16 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry [analytics/refinery@fc6767a]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 14:16 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 07:16 godog: powercycle ms-be2048
* 14:16 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 07:03 moritzm: installing systemd security updates on stretch
* 14:15 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 06:51 effie: restart memcached on eqiad mc* hosts
* 14:15 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a] (duration: 00m 03s)
* 06:51 effie: enable puppet on mc* hosts
* 14:14 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train - an-launcher1002 retry\ [analytics/refinery@fc6767a]
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 14:14 otto@deploy1002: Finished deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a] (duration: 14m 50s)
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:11 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:09 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:06 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:04 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:01 jiji@cumin1001: conftool action : set/pooled=no; selector: name=mw2280.codfw.wmnet,cluster=videoscaler
* 13:59 otto@deploy1002: Started deploy [analytics/refinery@fc6767a]: Regular analytics weekly train [analytics/refinery@fc6767a]
* 13:42 moritzm: upgrading mw1276 to PHP 7.2.34
* 13:40 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:40 ayounsi@deploy1002: Finished deploy [homer/deploy@759f82c]: Homer release v0.2.7 (duration: 00m 13s)
* 13:40 ayounsi@deploy1002: Started deploy [homer/deploy@759f82c]: Homer release v0.2.7
* 13:38 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:36 otto@deploy1002: Finished deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint (duration: 05m 17s)
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:33 moritzm: upgrading mw1261 to PHP 7.2.34
* 13:31 otto@deploy1002: Started deploy [analytics/aqs/deploy@ad170d4]: deploy Refactor pageviews per-article endpoint
* 13:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:26 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'similar-users' for release 'main' .
* 13:22 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:21 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'push-notifications' for release 'main' .
* 13:15 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/includes/actions/RollbackAction.php: {{Gerrit|ccbfcf28a2f507ed40dcf7af748c30f581b5079f}}: Do not mark rollbacks as bot edits ([[phab:T280655|T280655]]) (duration: 00m 57s)
* 13:12 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 13:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2002.codfw.wmnet with reason: REIMAGE
* 13:09 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:07 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 13:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2076.codfw.wmnet with reason: REIMAGE
* 12:58 moritzm: reimaging cumin2002 to bullseye [[phab:T276589|T276589]]
* 12:55 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:54 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 12:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 12:51 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:49 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:47 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'recommendation-api' for release 'production' .
* 12:42 moritzm: uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf1 to component/php72
* 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 to check its tables [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15483 and previous config saved to /var/cache/conftool/dbconfig/20210420-124118-marostegui.json
* 12:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5003.eqsin.wmnet
* 12:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:27 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams-internal' for release 'main' .
* 12:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5003.eqsin.wmnet
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 12:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 12:18 CFisch_WMDE: European mid-day backport window done
* 12:05 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:681321{{!}}Add NS_PROJECT alias for azwiki (T280577)]] (duration: 00m 57s)
* 12:04 moritzm: drain ganeti5003
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:55 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
* 11:54 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: [[gerrit:681153{{!}}CommentFormatter: Add ext-discussiontools-section class instead of overwriting (T280433)]] (duration: 00m 57s)
* 11:47 moritzm: failover ganeti master in eqsin to ganeti5001
* 11:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
* 11:38 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/VisualEditor/modules/ve-mw/ui/pages/ve.ui.MWParameterPage.js: Backport: [[gerrit:679462{{!}}Add filtering for the suggested values combo box (T271898)]] (duration: 00m 58s)
* 11:15 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:676930{{!}}Add default import sources (T214139)]] (duration: 00m 58s)
* 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:09 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 11:07 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 10:49 _joe_: temporary installing some python packages on deploy1002 for testing
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5001.eqsin.wmnet
* 10:34 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host ganeti5001.eqsin.wmnet
* 10:20 moritzm: drain ganeti5001
* 10:11 hnowlan: opening access to cassandra on new AQS hosts (aqs101*) to analytics-in4 filter
* 10:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1001.eqiad.wmnet
* 10:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host aphlict1001.eqiad.wmnet
* 09:42 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin2001.codfw.wmnet,cumin1001.eqiad.wmnet
* 09:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:40 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1002.eqiad.wmnet with reason: REIMAGE
* 09:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1001.eqiad.wmnet with reason: REIMAGE
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:20 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 08:58 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 08:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 08:50 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 08:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
* 08:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
* 08:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
* 08:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
* 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
* 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2128.codfw.wmnet with reason: REIMAGE
* 08:09 dcaro: reprepro updating thirdparty/ceph-octopus repo
* 08:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
* 08:07 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
* 08:05 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 08:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
* 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1086 from dbctl [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15482 and previous config saved to /var/cache/conftool/dbconfig/20210420-075949-marostegui.json
* 07:38 XioNoX: BGP: prioritize directly connected peers - [[phab:T280054|T280054]]
* 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15480 and previous config saved to /var/cache/conftool/dbconfig/20210420-073808-root.json
* 07:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15479 and previous config saved to /var/cache/conftool/dbconfig/20210420-072305-root.json
* 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15478 and previous config saved to /var/cache/conftool/dbconfig/20210420-070801-root.json
* 07:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15477 and previous config saved to /var/cache/conftool/dbconfig/20210420-065257-root.json
* 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2127.codfw.wmnet with reason: REIMAGE
* 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: REIMAGE
* 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE
* 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2074.codfw.wmnet with reason: REIMAGE
* 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2105.codfw.wmnet with reason: REIMAGE


== 2021-04-19 ==
== 2021-07-20 ==
* 22:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 22:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 22:37 Trey314159: reindexing wikidata on cloudelastic finished/failed ([[phab:T274200|T274200]])
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 22:37 Trey314159: reindexing commons and wikidata on elastic@eqiad finished/failed ([[phab:T274200|T274200]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.wikimedia.org with reason: REIMAGE
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 21:03 sbassett: Deployed security patch for [[phab:T280226|T280226]]
* 17:06 rzl: enabled puppet on A:mw
* 19:56 dcausse: repool wdqs1005
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2004.codfw.wmnet
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2004.codfw.wmnet
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 18:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2003.codfw.wmnet
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 18:56 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/tests: Factor out rollback logic from WikiPage - /tests (duration: 00m 59s)
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 18:55 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/maintenance: Factor out rollback logic from WikiPage - /maintenance (duration: 00m 57s)
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:51 ppchelko@deploy1002: Synchronized php-1.37.0-wmf.1/includes/: Factor out rollback logic from WikiPage - /includes (duration: 01m 01s)
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 18:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2003.codfw.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:47 jiji@cumin1001: conftool action : set/pooled=yes; selector: cluster=thumbor,name=thumbor2001.codfw.wmnet
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 18:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2002.codfw.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 18:39 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T274436|T274436]] Math: Enable RESTBase-less Wikidata math validation (duration: 00m 56s)
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 18:34 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2002.codfw.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 18:21 ppchelko@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T249745|T249745]] [EventBus] Make eventage-main timeout consistent with envoy (duration: 00m 56s)
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 18:13 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/DiscussionTools/: {{Gerrit|66d137b75a7073c7162c443cc8c6ec6f3be714e0}}: Remove <header> tags around headings for compat with MobileFrontend ([[phab:T280433|T280433]]) (duration: 00m 59s)
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor2001.codfw.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 18:02 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.1/extensions/GrowthExperiments/includes/Mentorship/Store/DatabaseMentorStore.php: {{Gerrit|0233507470377f6ac45768e345cd2e359e5d0e57}}: DatabaseMentorStore: Fix deprecation warning in upsert query ([[phab:T280525|T280525]]) (duration: 00m 57s)
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:47 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor2001.codfw.wmnet
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1004.eqiad.wmnet
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 17:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1004.eqiad.wmnet
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 17:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1003.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 17:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1003.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 17:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1002.eqiad.wmnet
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 16:57 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1002.eqiad.wmnet
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 16:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thumbor1001.eqiad.wmnet
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 16:48 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host thumbor1001.eqiad.wmnet
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 16:25 hoo: Updated the Wikidata property suggester with data from the 2021-04-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15474 and previous config saved to /var/cache/conftool/dbconfig/20210419-161134-root.json
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 90%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15473 and previous config saved to /var/cache/conftool/dbconfig/20210419-155631-root.json
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 80%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15472 and previous config saved to /var/cache/conftool/dbconfig/20210419-154127-root.json
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 70%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15471 and previous config saved to /var/cache/conftool/dbconfig/20210419-152623-root.json
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 15:24 volans: reverted debmonitor-client to 0.2.0-1 on apt.w.o for jessie-wikimedia
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 60%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15470 and previous config saved to /var/cache/conftool/dbconfig/20210419-151119-root.json
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15469 and previous config saved to /var/cache/conftool/dbconfig/20210419-145616-root.json
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:53 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Rename RelatedArticles wmg variables to wg (duration: 00m 56s)
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:53 jbond42: update debmonitor-client - [[phab:T280484|T280484]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:52 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove RelatedArticles extension function and wmg to wg mapping (duration: 00m 56s)
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:48 reedy@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: Use namespaced PoolCounter Client (duration: 00m 57s)
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1086 [[phab:T278229|T278229]]', diff saved to https://phabricator.wikimedia.org/P15468 and previous config saved to /var/cache/conftool/dbconfig/20210419-144422-marostegui.json
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 40%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15467 and previous config saved to /var/cache/conftool/dbconfig/20210419-144112-root.json
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:41 volans: uploaded debmonitor-client 0.2.8 to apt.w.o for jessie, stretch, buster, bullseye
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 14:29 hnowlan: imported envoyproxy_1.16.3-1 debs to envoy-future component
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 30%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15466 and previous config saved to /var/cache/conftool/dbconfig/20210419-142608-root.json
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 20%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15465 and previous config saved to /var/cache/conftool/dbconfig/20210419-141105-root.json
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 15%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15464 and previous config saved to /var/cache/conftool/dbconfig/20210419-135601-root.json
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15463 and previous config saved to /var/cache/conftool/dbconfig/20210419-134057-root.json
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: Slowly pool db1182 for the first time in s2 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15462 and previous config saved to /var/cache/conftool/dbconfig/20210419-132554-root.json
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15461 and previous config saved to /var/cache/conftool/dbconfig/20210419-131936-marostegui.json
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15460 and previous config saved to /var/cache/conftool/dbconfig/20210419-131501-marostegui.json
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:58 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bd076306c0ae0428ff13743f499b2a02d42b6eab}}: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD everywhere ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15459 and previous config saved to /var/cache/conftool/dbconfig/20210419-125600-marostegui.json
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1182 in s2 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15458 and previous config saved to /var/cache/conftool/dbconfig/20210419-125407-marostegui.json
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1182 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15457 and previous config saved to /var/cache/conftool/dbconfig/20210419-125301-marostegui.json
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 12:51 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ef0f68e2a9c1c638911bb06c47ba6e8ef88ee393}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_NEW ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 12:38 Urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=cswiki # [[phab:T279853|T279853]]
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 12:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3e3cce192f1e99cbcae739f234271411d10974ac}}: cswiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 58s)
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2126.codfw.wmnet with reason: REIMAGE
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 12:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 12:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2072.codfw.wmnet with reason: REIMAGE
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
* 12:44 moritzm: installing systemd security updates on buster
* 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 11:33 moritzm: imported debdeploy 0.0.99.13-1+deb11u1 to bullseye-wikimedia [[phab:T275873|T275873]]
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:27 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki --force # [[phab:T279853|T279853]]
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:11 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/migrateMentorMenteeRelationship.php --wiki=testwiki # [[phab:T279853|T279853]]
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|03f8ed819091624f5ae4a8d7ed3631dc322fabcd}}: testwiki: wgGEMentorshipMigrationStage: Set to WRITE_BOTH/READ_OLD ([[phab:T279853|T279853]]) (duration: 00m 57s)
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:05 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:680871{{!}}Disable legacy javascript variable for the rest of wikis (T72470)]] (duration: 00m 57s)
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:02 moritzm: import promethus-rsyslog-exporter for bullseye-wikimedia/main
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:01 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 10:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 57s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 10:45 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:681008{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 10:34 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 10:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on theemin.codfw.wmnet with reason: REIMAGE
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 10:24 hnowlan: imported 1.16.3 into envoy-future
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 10:22 moritzm: reimaging theemin to bullseye
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 10:15 dcausse: depooling wdqs1005
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: REIMAGE
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:05 arturo: aborrero@apt1001:~ $ sudo -i reprepro --component thirdparty/kubeadm-k8s-1-18 update buster-wikimedia
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:04 arturo: aborrero@apt1001:~ $ sudo -i reprepro --delete clearvanished (remove old buster-wikimedia{{!}}thirdparty/kubeadm-k8s-1-15,16 repos and packages)
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:56 ema: cp3051: varnish-frontend-restart to apply exp policy settings changes starting from empty cache [[phab:T275809|T275809]]
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 09:53 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 09:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: REIMAGE
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15454 and previous config saved to /var/cache/conftool/dbconfig/20210419-092251-root.json
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 [[phab:T280492|T280492]]', diff saved to https://phabricator.wikimedia.org/P15453 and previous config saved to /var/cache/conftool/dbconfig/20210419-092234-marostegui.json
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15452 and previous config saved to /var/cache/conftool/dbconfig/20210419-091535-root.json
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 90%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15451 and previous config saved to /var/cache/conftool/dbconfig/20210419-090747-root.json
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15450 and previous config saved to /var/cache/conftool/dbconfig/20210419-090031-root.json
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 80%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15449 and previous config saved to /var/cache/conftool/dbconfig/20210419-085243-root.json
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15448 and previous config saved to /var/cache/conftool/dbconfig/20210419-084834-marostegui.json
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15447 and previous config saved to /var/cache/conftool/dbconfig/20210419-084528-root.json
* 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15446 and previous config saved to /var/cache/conftool/dbconfig/20210419-084523-marostegui.json
* 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 70%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15445 and previous config saved to /var/cache/conftool/dbconfig/20210419-083740-root.json
* 08:35 ema: restart debmonitor-client.service on cp4030, dns5002, an-worker1106 [[phab:T280484|T280484]]
* 08:34 marostegui: Testing log
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 100%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15444 and previous config saved to /var/cache/conftool/dbconfig/20210419-083021-root.json
* 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15443 and previous config saved to /var/cache/conftool/dbconfig/20210419-083018-root.json
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15442 and previous config saved to /var/cache/conftool/dbconfig/20210419-082559-marostegui.json
* 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 60%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15441 and previous config saved to /var/cache/conftool/dbconfig/20210419-082236-root.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15440 and previous config saved to /var/cache/conftool/dbconfig/20210419-082000-root.json
* 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 75%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15439 and previous config saved to /var/cache/conftool/dbconfig/20210419-081517-root.json
* 08:07 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on labstore1004.eqiad.wmnet with reason: Restarting mysql
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15438 and previous config saved to /var/cache/conftool/dbconfig/20210419-080732-root.json
* 08:07 godog: swift eqiad-prod: less weight for ms-be[1019-1026] / more weight to ms-be106[0-3] - [[phab:T272836|T272836]]
* 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15437 and previous config saved to /var/cache/conftool/dbconfig/20210419-080456-root.json
* 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15436 and previous config saved to /var/cache/conftool/dbconfig/20210419-080454-root.json
* 08:03 moritzm: installing python-bleach security updates
* 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 50%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15435 and previous config saved to /var/cache/conftool/dbconfig/20210419-080013-root.json
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 40%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15434 and previous config saved to /var/cache/conftool/dbconfig/20210419-075229-root.json
* 07:51 moritzm: upgrade mwdebug2002 to PHP 7.2.34
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15433 and previous config saved to /var/cache/conftool/dbconfig/20210419-074953-root.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15432 and previous config saved to /var/cache/conftool/dbconfig/20210419-074950-root.json
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1085 (re)pooling @ 25%: Repool db1085', diff saved to https://phabricator.wikimedia.org/P15431 and previous config saved to /var/cache/conftool/dbconfig/20210419-074510-root.json
* 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15430 and previous config saved to /var/cache/conftool/dbconfig/20210419-074155-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 30%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15429 and previous config saved to /var/cache/conftool/dbconfig/20210419-073725-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15428 and previous config saved to /var/cache/conftool/dbconfig/20210419-073449-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15427 and previous config saved to /var/cache/conftool/dbconfig/20210419-073446-root.json
* 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15426 and previous config saved to /var/cache/conftool/dbconfig/20210419-073425-root.json
* 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 20%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15425 and previous config saved to /var/cache/conftool/dbconfig/20210419-072221-root.json
* 07:21 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15424 and previous config saved to /var/cache/conftool/dbconfig/20210419-071943-root.json
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15423 and previous config saved to /var/cache/conftool/dbconfig/20210419-071921-root.json
* 07:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15422 and previous config saved to /var/cache/conftool/dbconfig/20210419-071701-marostegui.json
* 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 15%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15421 and previous config saved to /var/cache/conftool/dbconfig/20210419-070718-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15420 and previous config saved to /var/cache/conftool/dbconfig/20210419-070439-root.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15419 and previous config saved to /var/cache/conftool/dbconfig/20210419-070418-root.json
* 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15418 and previous config saved to /var/cache/conftool/dbconfig/20210419-070035-marostegui.json
* 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15417 and previous config saved to /var/cache/conftool/dbconfig/20210419-065627-root.json
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Slowly pool db1179 for the first time in s3 [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15416 and previous config saved to /var/cache/conftool/dbconfig/20210419-065213-root.json
* 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P15415 and previous config saved to /var/cache/conftool/dbconfig/20210419-064914-root.json
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15414 and previous config saved to /var/cache/conftool/dbconfig/20210419-064600-marostegui.json
* 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15413 and previous config saved to /var/cache/conftool/dbconfig/20210419-064123-root.json
* 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15412 and previous config saved to /var/cache/conftool/dbconfig/20210419-062620-root.json
* 06:17 _joe_: upgrading envoy everywhere in eqiad [[phab:T280317|T280317]]
* 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15411 and previous config saved to /var/cache/conftool/dbconfig/20210419-061116-root.json
* 06:10 _joe_: upgrading envoy everywhere in codfw [[phab:T280317|T280317]]
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15410 and previous config saved to /var/cache/conftool/dbconfig/20210419-060321-marostegui.json
* 06:01 _joe_: rolling out further envoy upgrades [[phab:T280317|T280317]]
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 10%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15409 and previous config saved to /var/cache/conftool/dbconfig/20210419-055613-root.json
* 05:53 marostegui: Stop sanitarium master on s2 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15408 and previous config saved to /var/cache/conftool/dbconfig/20210419-055240-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P15407 and previous config saved to /var/cache/conftool/dbconfig/20210419-054831-marostegui.json
* 05:42 marostegui: Stop sanitarium master on s1 (lag will show up on clouddb* labsdb* hosts) [[phab:T272008|T272008]]
* 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 [[phab:T272008|T272008]]', diff saved to https://phabricator.wikimedia.org/P15406 and previous config saved to /var/cache/conftool/dbconfig/20210419-054158-marostegui.json
* 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15405 and previous config saved to /var/cache/conftool/dbconfig/20210419-053730-marostegui.json
* 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1179 in s3 for the first time with minimal weight [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15404 and previous config saved to /var/cache/conftool/dbconfig/20210419-053127-marostegui.json
* 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1179 to dbctl [[phab:T275633|T275633]]', diff saved to https://phabricator.wikimedia.org/P15403 and previous config saved to /var/cache/conftool/dbconfig/20210419-053050-marostegui.json
* 05:05 marostegui: Restart m2 database master [[phab:T280251|T280251]]


== 2021-04-18 ==
== 2021-07-19 ==
* 06:40 Amir1: cleaning watchlist of User:Mr._Ibrahem in wikidatawiki (in main ns only)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 18:46 brennen: gerrit1001: restarting gerrit
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 17:23 volans: running authdns-update to force-update authdns2001
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 15:10 godog: +100G to prometheus/ops in codfw
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 11:40 moritzm: installing bluez security updates
* 11:31 Lucas_WMDE: EU backport+config window done
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 08:15 vgutierrez: depool codfw text traffic
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 03:26 twentyafterfour: restarted phd on phab1001
* 03:25 twentyafterfour: investigating PHD failure


== 2021-04-17 ==
== 2021-07-16 ==
* 16:16 Amir1: cleaning SuccuBot's watchlist in wikidatawiki
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 00:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 00:48 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1307.eqiad.wmnet
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 00:23 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1402.eqiad.wmnet
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 00:22 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1403.eqiad.wmnet
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1403.eqiad.wmnet
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 00:18 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1402.eqiad.wmnet
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 00:14 ryankemper: [[phab:T267927|T267927]] `sudo run-puppet-agent` and `sudo pool` on `wdqs2003`
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 00:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 00:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1307.eqiad.wmnet with reason: REIMAGE
* 15:48 vgutierrez: restart pybal on lvs2010
* 00:08 ryankemper: [[phab:T267927|T267927]] Reload of `wdqs2003` complete
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 00:07 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 00:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)


== 2021-04-16 ==
== 2021-07-15 ==
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwdebug1003.eqiad.wmnet
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1403.eqiad.wmnet with reason: REIMAGE
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 23:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1402.eqiad.wmnet with reason: REIMAGE
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 23:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mwdebug1003.eqiad.wmnet
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 23:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwdebug1003.eqiad.wmnet
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 23:47 mutante: decom'ing mwdebug1003, stretch VM created in [[phab:T267248|T267248]]
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 23:39 mutante: reimaging last 3 remaining stretch appservers with buster, mw1307, mw1402, mw1403
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 23:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 23:36 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1402-1403].eqiad.wmnet with reason: reimage
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 21:08 ejegg: updated fundraising python tools from {{Gerrit|ef54260b0d}} to {{Gerrit|3d950fffbd}}
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 20:40 Trey314159: reindexing wikidata on cloudelastic... AGAIN ([[phab:T274200|T274200]])
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 17:48 ryankemper: [[phab:T267927|T267927]] Transferring from `wdqs2008`->`wdqs2003` to resolve the data corruption on `wdqs2003`
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 17:47 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 17:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 17:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.wikimedia.org with reason: REIMAGE
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 17:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 17:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.wikimedia.org with reason: REIMAGE
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 17:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:35 mutante: depooling mwdebug1003 (stretch VM, will be removed), mwdebug1001/1002 (buster) and unchanged
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:34 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mwdebug1003.eqiad.wmnet
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.wikimedia.org with reason: REIMAGE
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.wikimedia.org with reason: REIMAGE
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 17:03 ryankemper: [[phab:T267927|T267927]] Pooled `wdqs1007`, `wdqs2003`, `wdqs1008`, `wdqs2004`
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 17:00 ryankemper: [[phab:T267927|T267927]] Following data transfers complete: `wdqs1004`->`wdqs1007`, `wdqs2001`->`wdqs2003`, `wdqs1003`->`wdqs1008`, `wdqs2008`->`wdqs2004`
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:00 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 16:59 ryankemper@cumin2001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 16:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:09 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:57 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 15:43 urbanecm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 15:31 urbanecm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 15:22 urbanecm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 14:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2023.codfw.wmnet
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 14:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev1006.eqiad.wmnet with reason: restarting for kernel update
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on restbase-dev[1005-1006].eqiad.wmnet with reason: restarting for kernel update
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:50 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2023.codfw.wmnet
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2022.codfw.wmnet
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2021.codfw.wmnet
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 14:31 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2021.codfw.wmnet
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2020.codfw.wmnet
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:18 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2020.codfw.wmnet
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2019.codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 12:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2019.codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 12:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2018.codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 12:48 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2018.codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 12:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:47 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2017.codfw.wmnet
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:41 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2017.codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:37 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 12:25 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 12:22 jayme: updated envoyproxy to 1.15.4-1 on 'A:mw-canary or A:restbase-canary'
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2016.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 11:02 moritzm: imported ferm 2.5.1-1+wmf1 to bullseye-wikimedia/main [[phab:T275873|T275873]]
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 11:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2016.codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:55 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2015.codfw.wmnet
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 10:49 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2015.codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:44 arturo: merging homer change to cr-eqiad ([[phab:T279342|T279342]])
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2014.codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 10:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2014.codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 10:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2013.codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 10:20 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2013.codfw.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 10:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2012.codfw.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 10:08 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2012.codfw.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 10:08 jayme: updated envoyproxy to 1.15.4-1 on mw1325.eqiad.wmnet,restbase1026.eqiad.wmnet
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 10:05 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:04 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2011.codfw.wmnet
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 10:03 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:00 jayme: updated envoyproxy to 1.15.4-1 on mwdebug1001.eqiad.wmnet
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 09:57 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2011.codfw.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 09:55 jayme: imported envoyproxy_1.15.4-1 to stretch-wikimedia - [[phab:T280317|T280317]]
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 09:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2010.codfw.wmnet
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15384 and previous config saved to /var/cache/conftool/dbconfig/20210416-093446-root.json
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:33 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2010.codfw.wmnet
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2009.codfw.wmnet
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 09:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2009.codfw.wmnet
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15383 and previous config saved to /var/cache/conftool/dbconfig/20210416-091942-root.json
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:13 jayme: imported envoyproxy_1.15.4-1 to buster-wikimedia - [[phab:T280317|T280317]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 09:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15380 and previous config saved to /var/cache/conftool/dbconfig/20210416-090438-root.json
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15374 and previous config saved to /var/cache/conftool/dbconfig/20210416-084935-root.json
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P15373 and previous config saved to /var/cache/conftool/dbconfig/20210416-083431-root.json
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 07:53 elukey: run reprepro --delete clearvanished on apt1001 to clear all cloudera packages