You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(marostegui: Upgrade and reboot db1085 T258360)
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(334 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-07-19 ==
== 2021-08-03 ==
* 19:16 marostegui: Upgrade and reboot db1085 [[phab:T258360|T258360]]
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 marostegui: Start mysql on db1082 [[phab:T258336|T258336]]
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:51 marostegui: Upgrade and reboot db1082 [[phab:T258336|T258336]]
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 18:45 cdanis@cumin1001: dbctl commit (dc=all): 'db1085 also crashed', diff saved to https://phabricator.wikimedia.org/P11952 and previous config saved to /var/cache/conftool/dbconfig/20200719-184511-cdanis.json
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 Urbanecm: Run mwscript emptyUserGroup.php --wiki=testwiki contestadmin ([[phab:T256555|T256555]])
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-07-18 ==
== 2021-08-02 ==
* 21:41 shdubsh: restart logstash on logstash200[456]
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:14 shdubsh: bounce logstash on logstash1007
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 21:10 shdubsh: bounce logstash on logstash1008
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:06 shdubsh: bounce logstash on logstash1009
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:52 marostegui: Due to db1082 crash there will be replication lag on s5 on labsdb hosts - [[phab:T258336|T258336]]
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 20:37 cdanis@cumin1001: dbctl commit (dc=all): 'depool db1082, it crashed', diff saved to https://phabricator.wikimedia.org/P11951 and previous config saved to /var/cache/conftool/dbconfig/20200718-203704-cdanis.json
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:13 dpifke: Performing one-time expiration of ArcLamp files older than 40 days (normal retention is 45 days), to solve disk space issue until either Ganeti issue is solved or compressed logfile support is merged.
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 12:20 mutante: gerrit servers: disabling puppet
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 11:27 hashar: restarting Jenkins on contint2001
* 11:27 hashar: restarting Jenkins on contint1001
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:13 urbanecm: EU B&C window completed
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:08 moritzm: installing openjdk-11 security updates
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 07:24 moritzm: installing libsndfile security updates on buster
* 07:12 moritzm: installing aspell security updates
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)


== 2020-07-17 ==
== 2021-07-31 ==
* 21:16 dpifke: Removing MongoDB packages and data from webperf1002.
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 17:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@a5d2fd3]: (no justification provided) (duration: 00m 05s)
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 17:38 dpifke@deploy1001: Started deploy [performance/arc-lamp@a5d2fd3]: (no justification provided)
* 13:53 akosiaris: powercycle kubernetes2002
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P11944 and previous config saved to /var/cache/conftool/dbconfig/20200717-122400-marostegui.json
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11941 and previous config saved to /var/cache/conftool/dbconfig/20200717-120126-marostegui.json
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11940 and previous config saved to /var/cache/conftool/dbconfig/20200717-115155-marostegui.json
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11939 and previous config saved to /var/cache/conftool/dbconfig/20200717-113800-marostegui.json
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11938 and previous config saved to /var/cache/conftool/dbconfig/20200717-113050-marostegui.json
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P11937 and previous config saved to /var/cache/conftool/dbconfig/20200717-112413-marostegui.json
* 09:15 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 09:12 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 08:48 moritzm: imported prometheus-atlas-exporter 1.0+git20191204.ffafab7-2 to buster-wikimedia [[phab:T247967|T247967]]
* 08:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 08:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11936 and previous config saved to /var/cache/conftool/dbconfig/20200717-075124-marostegui.json
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P11935 and previous config saved to /var/cache/conftool/dbconfig/20200717-074335-marostegui.json
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 07:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 07:30 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 06:30 XioNoX: rename msw1-codfw interface range
* 06:28 XioNoX: rename msw1-eqiad interface range
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P11934 and previous config saved to /var/cache/conftool/dbconfig/20200717-044748-marostegui.json
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092', diff saved to https://phabricator.wikimedia.org/P11933 and previous config saved to /var/cache/conftool/dbconfig/20200717-044658-marostegui.json


== 2020-07-16 ==
== 2021-07-30 ==
* 22:15 mutante: testreduce1001 manually git clone 'scandium' branch of integration/visualdiff into /srv/visualdiff ([[phab:T257906|T257906]])
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:54 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3 (duration: 01m 49s)
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:52 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2 (duration: 01m 33s)
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:41 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:40 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 (duration: 01m 01s)
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:39 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:08 cstone: payments-wiki revision changed from {{Gerrit|91852dbc9b}} to {{Gerrit|bf91f8adff}}
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client error logging on Catalan Wikipedia ([[phab:T258073|T258073]]) (duration: 00m 57s)
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 19:32 sbassett: Deployed mitigations for [[phab:T257687|T257687]]
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 19:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248418|T248418]] TimedMediaHandler: Make videojs the only player on all group0 (duration: 00m 57s)
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:54 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:53 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 18:50 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 18:49 addshore: deployment windows finished with
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 addshore@deploy1001: Synchronized wmf-config/extension-list: [[gerrit:611393]] extension-list: Load WikibaseClient via JSON (duration: 00m 56s)
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 2/2 (duration: 00m 56s)
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 1/2 (duration: 00m 56s)
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 18:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613165]] [[phab:T138104|T138104]] Wikibase: stop setting wmgWikibaseTmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 addshore@deploy1001: Synchronized wmf-config/config/incubatorwiki.yaml: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT2/2 (duration: 00m 57s)
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 18:22 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT1/2 (duration: 00m 56s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 18:20 addshore@deploy1001: Synchronized wmf-config/config/nlwikimedia.yaml: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT2/2 (duration: 00m 57s)
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 18:18 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT1/2 (duration: 00m 56s)
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 18:14 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613164]] [[phab:T138104|T138104]] Wikibase: stop setting wgWBRepoSettings tmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:12 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613192]] [[phab:T246420|T246420]] Enable limited-width layout for Modern Vector (duration: 00m 56s)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 18:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612870]] [[phab:T246977|T246977]] Disable affinity quicksurveys for the following wikis (duration: 00m 57s)
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 17:54 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 17:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 17:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 17:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 17:49 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 17:17 XioNoX: msw1-eqiad delete unused VC-ports
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 17:05 XioNoX: msw1-codfw - replace member-range with list of individual interfaces
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 16:45 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613173{{!}}Re add OtherProjectsSidebarGenerator::buildProjectLinkSidebarFromItemId (T258184)]] (duration: 01m 02s)
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 16:11 effie: reboot rdb1009 - [[phab:T254990|T254990]]
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 16:06 effie: Reboot rdb1010 - [[phab:T254990|T254990]]
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 15:51 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613170{{!}}Revert "Revert "Removes OtherProjectsSidebar hook"" (T258184)]] (duration: 01m 02s)
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 15:40 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 15:15 akosiaris: lower codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]. Will open up task for it
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 15:15 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 15:07 XioNoX: repool eqsin - [[phab:T257154|T257154]]
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 15:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:54 XioNoX: load config on cr3-eqsin - [[phab:T257154|T257154]]
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 14:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613167{{!}}Avoid trying to register wikibase.Site twice (T258065)]] (duration: 01m 03s)
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 14:31 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:23 moritzm: installing libsndfile security updates on stretch
* 14:12 moritzm: rebooting webperf hosts in eqiad for kernel update
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 14:09 XioNoX: upgrade junos on cr3-eqsin - [[phab:T257154|T257154]]
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 14:03 jayme: published image docker-registry.discovery.wmnet/envoy:1.14.4-1
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 13:47 XioNoX: remove nonstop-bridging from asw1-eqsin
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 13:36 XioNoX: power-off cr3-eqsin - [[phab:T257154|T257154]]
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 13:36 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 13:35 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 13:30 XioNoX: deactivate BGP groups IX/Transit/PyBal on cr3-eqsin - [[phab:T257154|T257154]]
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 13:27 moritzm: installing an-tool1008
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 13:23 XioNoX: depool eqsin for cr3 replacement - [[phab:T257154|T257154]]
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 13:13 volans@deploy1001: Finished deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin (duration: 01m 27s)
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 13:12 volans@deploy1001: Started deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 13:04 kormat: restarting tendril to pick up new mariadb config [[phab:T257816|T257816]]
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 13:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.41
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json
* 13:02 akosiaris: increase codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]
* 13:01 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092', diff saved to https://phabricator.wikimedia.org/P11926 and previous config saved to /var/cache/conftool/dbconfig/20200716-125643-marostegui.json
* 12:56 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 04m 32s)
* 12:52 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:42 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 03m 42s)
* 12:38 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:36 akosiaris@cumin1001: conftool action : set/weight=50; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5% [[phab:T218733|T218733]]
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5%
* 12:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:22 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:08 jayme: updated envoyproxy to 1.14.4-1 on mw-canary and restbase-canary
* 11:44 XioNoX: remove BGP to AS396253 in eqdfw (peer left the IX)
* 11:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258134|T258134]] Fix config variables regex concatenation (duration: 01m 05s)
* 11:23 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[phab:T254315|T254315]] [[gerrit:612670]] Wikibase: remove wmgWikibaseLocalEntitySourceName (duration: 01m 05s)
* 11:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254315|T254315]] [[phab:T257266|T257266]] [[gerrit:609988]] Wikidata client wikis: Define entity sources configuration (take 3) (duration: 01m 08s)
* 10:17 jbond42: upgrade to hiera5
* 10:08 jbond42: disable puppet for hiera5 deployment
* 09:37 jayme: updated envoyproxy to 1.14.4-1 on mw1325.eqiad.wmnet and restbase1026.eqiad.wmnet
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 moritzm: rebooting flowspec1001
* 08:52 jayme: updated envoyproxy to 1.14.4-1 on mwdebug1001.eqiad.wmnet
* 08:41 moritzm: installing sqlite3 security updates
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P11924 and previous config saved to /var/cache/conftool/dbconfig/20200716-083954-marostegui.json
* 08:35 XioNoX: Remove PIM/IGMP related CR stanza (acls) - [[phab:T257573|T257573]]
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 moritzm: installing dbus security updates
* 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 XioNoX: remove igmp-snooping from access switches - [[phab:T257573|T257573]]
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 moritzm: installing python-urllib3 security updates
* 08:15 XioNoX: remove PIM config from eqord/eqdfw/knams routers - [[phab:T257573|T257573]]
* 08:14 XioNoX: remove PIM config from eqiad routers - [[phab:T257573|T257573]]
* 08:11 XioNoX: remove PIM config from esams routers - [[phab:T257573|T257573]]
* 08:09 XioNoX: remove PIM config from eqsin routers - [[phab:T257573|T257573]]
* 08:08 jbond42: update mail delivery for phabricator to use phabricator.discovery.wmnet cname
* 08:07 XioNoX: remove PIM config from codfw routers - [[phab:T257573|T257573]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P11923 and previous config saved to /var/cache/conftool/dbconfig/20200716-080613-marostegui.json
* 08:03 XioNoX: remove PIM config from ulsfo routers - [[phab:T257573|T257573]]
* 07:41 jayme: imported envoyproxy_1.14.4-1 to stretch-wikimedia
* 07:31 jayme: imported envoyproxy_1.14.4-1 to buster-wikimedia
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1131', diff saved to https://phabricator.wikimedia.org/P11922 and previous config saved to /var/cache/conftool/dbconfig/20200716-072838-marostegui.json
* 07:25 marostegui: Drop database reviewdb-test [[phab:T255715|T255715]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11921 and previous config saved to /var/cache/conftool/dbconfig/20200716-070331-marostegui.json
* 06:40 XioNoX: remove peering with AS8403 in eqsin (peer left the IX)
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11920 and previous config saved to /var/cache/conftool/dbconfig/20200716-051342-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11919 and previous config saved to /var/cache/conftool/dbconfig/20200716-051109-marostegui.json


== 2020-07-15 ==
== 2021-07-29 ==
* 23:54 eileen: tools revision changed from {{Gerrit|7b6018a16e}} to {{Gerrit|711d671600}}
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 23:50 eileen: process-control config revision is {{Gerrit|1fc4a9686d}}
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 23:04 bd808: tools.admin Removed valhallasw from maintainers ([[phab:T255697|T255697]])
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 23:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 22:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 22:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 22:27 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 22:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 18:16 brennen: restarting jenkins for upgrade
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 18:00 mutante: DNS - new language 'avk' has been added - This language is called Kotava and is "a proposed international auxiliary language (IAL) that focuses especially on the principle of cultural neutrality". Learn more at https://en.wikipedia.org/wiki/Kotava
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 17:32 mutante: puppetmaster - revoking cert for planet.discovery.wmnet, add planet.wikimedia.org, remove planet.svc records, remove specific and outdated hostnames ([[phab:T257840|T257840]])
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 16:11 moritzm: uploaded jenkins 2.235.2 to thirdparty/ci for stretch/buster [[phab:T257614|T257614]]
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 15:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 15:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 15:20 moritzm: rebooting webperf* hosts for kernel update
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:58 addshore@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/repo: [[gerrit:612723]] Stop checking if WikibaseLib is loaded [[phab:T258062|T258062]] (already on mwmaint1002) (duration: 01m 08s)
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:51 addshore: pulled https://gerrit.wikimedia.org/r/612723 onto mwmaint 1002 ahead of syncing everywhere (and CI finishing)
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:37 ema: A:cp: upgrade purged to 0.17 [[phab:T257573|T257573]]
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:30 ema: upload purged 0.17 to buster-wikimedia [[phab:T257573|T257573]]
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 04s)
* 14:05 vgutierrez: restart pybal on lvs2007
* 14:26 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 05s)
* 13:59 vgutierrez: restart pybal on lvs1014
* 14:25 gehel: repooling wdqs1006 - catched up on lag
* 13:55 vgutierrez: restart pybal on lvs1015
* 14:12 akosiaris: increase codfw mobileapps kubernetes traffic to 2% [[phab:T218733|T218733]]
* 13:52 _joe_: restarting pybal on lvs1016
* 14:10 akosiaris@cumin1001: conftool action : set/weight=132; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 13:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258056|T258056]] Add temporary fix to ensure array is passed to array_map() (duration: 01m 08s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 13:54 akosiaris: pool kubernetes nodes for mobileapps in codfw
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 13:53 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 13:53 akosiaris@cumin1001: conftool action : set/weight=264; selector: dc=codfw,service=mobileapps,name=scb.*
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 13:51 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 13:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.41 (duration: 01m 05s)
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 13:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.41
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 11:59 addshore: deploy window closed / done :)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 11:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609987]] Commons: Define entity sources configuration (take 2) [[phab:T254315|T254315]] (duration: 01m 03s)
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 11:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612668]] Wikibase test: Client local entity sources are always testwikidata [[phab:T254315|T254315]] (duration: 01m 05s)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 11:27 addshore@deploy1001: Synchronized wmf-config: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT2/2 (duration: 01m 06s)
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 11:26 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT1/2 (duration: 01m 05s)
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 11:16 XioNoX: remove as-path prepending in esams
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:11 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: LABS [[gerrit:612667]] Wikibase labs: All client "local" entity sources are wikidata [[phab:T254315|T254315]] (duration: 01m 04s)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:612666]] Wikibase: Split localEntitySourceName config for repo and client [[phab:T254315|T254315]] (duration: 01m 16s)
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 11:05 XioNoX: re-enable ping offload in esams
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 10:56 XioNoX: disable ping offload in esams
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:55 XioNoX: re-enable ping offload in codfw
* 07:52 moritzm: restarting Tomcat on idp-test
* 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 10:45 XioNoX: disable ping offload in codfw
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 10:44 XioNoX: re-enable ping offload in eqiad
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:31 XioNoX: disable ping offload in eqiad
* 10:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 10:30 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11916 and previous config saved to /var/cache/conftool/dbconfig/20200715-102605-marostegui.json
* 10:20 jayme: updating python3-docker-report to 0.0.5-1 on deneb
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11915 and previous config saved to /var/cache/conftool/dbconfig/20200715-100855-marostegui.json
* 10:07 jayme: imported docker-report_0.0.5-1 to buster-wikimedia
* 09:48 marostegui: Deploy schema change on s8 codfw master, lag will appear on codfw [[phab:T256685|T256685]]
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11914 and previous config saved to /var/cache/conftool/dbconfig/20200715-094226-marostegui.json
* 09:22 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:19 akosiaris: deploy mobileapps in kubernetes to talk HTTPS to the mw API
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:07 akosiaris: Correction: deploy eventgate-analytics-external in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
* 09:06 akosiaris: deploy eventgate-analytics in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
* 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P11913 and previous config saved to /var/cache/conftool/dbconfig/20200715-090545-marostegui.json
* 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11912 and previous config saved to /var/cache/conftool/dbconfig/20200715-085032-marostegui.json
* 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:19 moritzm: piwik.wikimedia.org switched to CAS authentication
* 08:19 elukey: move piwik.wikimedia.org to CAS (idp.wikimedia.org)
* 07:29 XioNoX: delete deprecated AS3209 AMS-IX router
* 06:59 dcausse: depooling wdqs1006 (high lag)
* 06:09 marostegui: Stop replication on db1120 to avoid having 10.4 -> 10.1 replication for long [[phab:T254871|T254871]]
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 for reimage [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11911 and previous config saved to /var/cache/conftool/dbconfig/20200715-060649-marostegui.json
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 master [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11910 and previous config saved to /var/cache/conftool/dbconfig/20200715-060145-marostegui.json
* 06:00 marostegui: Starting x1 failover from db1120 to db1103 - [[phab:T254871|T254871]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 ', diff saved to https://phabricator.wikimedia.org/P11909 and previous config saved to /var/cache/conftool/dbconfig/20200715-052939-marostegui.json
* 04:46 marostegui: Start x1 pre failover steps [[phab:T254871|T254871]]
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 weight to 0 before the switchover [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11908 and previous config saved to /var/cache/conftool/dbconfig/20200715-044432-marostegui.json
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1135', diff saved to https://phabricator.wikimedia.org/P11907 and previous config saved to /var/cache/conftool/dbconfig/20200715-044332-marostegui.json
* 01:45 eileen: tools revision changed from {{Gerrit|a9e7dc1559}} to {{Gerrit|7b6018a16e}}
* 00:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8f6f660]: 0.3.41 (duration: 15m 10s)
* 00:11 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8f6f660]: 0.3.41


== 2020-07-14 ==
== 2021-07-28 ==
* 19:52 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: [[phab:T252448|T252448]] [[phab:T255190|T255190]] Bump Parsoid to v0.12.0-a23 (duration: 01m 06s)
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 18:13 ryankemper: all long-running elasticsearch reindex jobs are complete
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 18:09 jforrester@deploy1001: Synchronized dblists/: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Remove the mobilemainpagelegacy dblist (duration: 01m 04s)
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 18:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop loading the mobilemainpagelegacy dblist (duration: 01m 05s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop varying wgMFSpecialCaseMainPage (duration: 01m 05s)
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 15:56 elukey: upgrade spark2 on stat100x to 2.4.4-bin-hadoop2.6-3
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 15:40 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 15:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 15:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 15:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 14:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/skins/Vector/includes/SkinVector.php: [[phab:T257914|T257914]] Restore div wrapper around print footer (duration: 01m 03s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 14:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Fix case of directory name (duration: 01m 05s)
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 14:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 14:48 moritzm: rebooting apt1001 for kernel update
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 14:42 jynus: stopping db1117:3322 (m2) replication temp. for otrs db cloning [[phab:T257928|T257928]]
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:26 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:14 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:13 andrewbogott: upgrading wikitech-static to mw 1.34.2
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:11 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 13:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11900 and previous config saved to /var/cache/conftool/dbconfig/20200714-132823-marostegui.json
* 13:29 moritzm: installing python2.7 security updates on stretch
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11899 and previous config saved to /var/cache/conftool/dbconfig/20200714-132742-marostegui.json
* 13:08 moritzm: installing python3.5 security updates on stretch
* 13:27 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:24 jbond42: reboot dns1001
* 11:27 moritzm: installing nginx security updates on thumbor*
* 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 13:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:22 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:22 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 13:18 jbond42: reboot dns1002
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 13:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 13:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:16 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 jbond42: reboot dns2002
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:13 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:13 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:10 jbond42: reboot dns2001
* 08:27 Amir1: running several long-running queries against pc1007
* 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
* 07:53 moritzm: installing aspell security updates on stretch
* 13:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 13:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 13:01 jbond42: rebooting dns3002
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 13:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 12:58 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 12:57 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing [[phab:T257887|T257887]] (duration: 01m 02s)
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 12:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 12:24 jbond42: route ns0.wikimedia.org to codfw for reboot
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 12:20 moritzm: installing xen security updates (client-side tools/libs)
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php
* 12:19 jbond42: re-enable puppet fleet
* 12:07 jbond42: disable puppet fleet wide to reboot puppetdb's
* 12:07 jbond42: disable puppet ro reboot puppetdb's
* 12:01 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.41
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for query plan checks [[phab:T238966|T238966]] ', diff saved to https://phabricator.wikimedia.org/P11898 and previous config saved to /var/cache/conftool/dbconfig/20200714-113612-marostegui.json
* 11:35 _joe_: restart pybal on lvs2009 [[phab:T257887|T257887]]
* 11:31 _joe_: restart pybal on lvs2010 [[phab:T257887|T257887]]
* 11:25 _joe_: restart pybal on lvs1015 [[phab:T257887|T257887]]
* 11:22 _joe_: restart pybal on lvs1016
* 11:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:59 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:56 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2005.codfw.wmnet
* 10:52 volans: powerdown wtp2005, hardware issue - [[phab:T257903|T257903]]
* 10:47 volans@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid
* 10:45 effie: depool wtp2005
* 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 10:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 10:32 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 10:18 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:14 James_F: Running AbuseFilter's updateVarDumps for group1 [[phab:T246539|T246539]]
* 10:13 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11897 and previous config saved to /var/cache/conftool/dbconfig/20200714-094449-marostegui.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11896 and previous config saved to /var/cache/conftool/dbconfig/20200714-094354-marostegui.json
* 09:39 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Add REL1_35 as a candidate release (duration: 01m 06s)
* 09:05 jforrester@deploy1001: Finished scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 51m 41s)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for PDU upgrade [[phab:T257871|T257871]]', diff saved to https://phabricator.wikimedia.org/P11895 and previous config saved to /var/cache/conftool/dbconfig/20200714-084033-marostegui.json
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 jforrester@deploy1001: Started scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 08:05 akosiaris: restart pybal on lvs2009
* 08:03 _joe_: restart pybal on lvs1016
* 08:02 akosiaris: restart pybal on lvs2007
* 08:01 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=restbase2009.codfw.wmnet
* 08:00 _joe_: restart pybal on lvs1015
* 08:00 akosiaris: restart pybal on lvs2010 after merging https://gerrit.wikimedia.org/r/612487
* 07:52 jforrester@deploy1001: sync aborted: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 02m 14s)
* 07:50 jforrester@deploy1001: Started scap: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 07:48 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 01m 06s)
* 07:32 oblivian@deploy1001: sync-file aborted: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 00m 20s)
* 07:31 oblivian@deploy1001: Scap failed!: 7/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 07:27 moritzm: installing libtasn1-6 security updates
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P11894 and previous config saved to /var/cache/conftool/dbconfig/20200714-071233-marostegui.json
* 07:04 marostegui: Drop gerrit, gerritro, gerrittest users from m2 databases - [[phab:T255715|T255715]]
* 06:58 marostegui: Stop mysql on db1131 for HW maintenance
* 06:56 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:54 jforrester@deploy1001: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) (duration: 24m 59s)
* 06:54 jforrester@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 06:53 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:53 marostegui: Deploy MCR schema change on s5 primary master [[phab:T238966|T238966]]
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11893 and previous config saved to /var/cache/conftool/dbconfig/20200714-065229-marostegui.json
* 06:29 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.41
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease a bit db1088 load', diff saved to https://phabricator.wikimedia.org/P11891 and previous config saved to /var/cache/conftool/dbconfig/20200714-051551-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for HW maintenance', diff saved to https://phabricator.wikimedia.org/P11890 and previous config saved to /var/cache/conftool/dbconfig/20200714-050931-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 from api', diff saved to https://phabricator.wikimedia.org/P11889 and previous config saved to /var/cache/conftool/dbconfig/20200714-050912-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1093 to s6 master and remove read-only from s6 [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11888 and previous config saved to /var/cache/conftool/dbconfig/20200714-050157-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11887 and previous config saved to /var/cache/conftool/dbconfig/20200714-050039-marostegui.json
* 05:00 marostegui: Starting s6 failover from db1131 to db1093 - [[phab:T257253|T257253]]
* 04:59 James_F: 1.35.0-wmf.41 branched at {{Gerrit|7d04152db4f8ea9a459511bed8117101d9bb4602}}
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11886 and previous config saved to /var/cache/conftool/dbconfig/20200714-043907-marostegui.json
* 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 in preparation for failover', diff saved to https://phabricator.wikimedia.org/P11885 and previous config saved to /var/cache/conftool/dbconfig/20200714-041548-marostegui.json
* 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11884 and previous config saved to /var/cache/conftool/dbconfig/20200714-041440-marostegui.json
* 01:23 ryankemper: Started long-running Elasticsearch reindex of `eqiad`, `codfw`, and `cloudelastic`. tmux session `reindex` under `ryankemper` on `mwmaint1002`
* 01:20 cdanis: ❌cdanis@lvs1015.eqiad.wmnet ~ 🕤🍺 sudo systemctl restart pybal.service
* 01:15 cdanis: ✔️ cdanis@lvs1016.eqiad.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:14 cdanis: ✔️ cdanis@lvs2009.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:01 cdanis: ✔️ cdanis@lvs2010.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service


== 2020-07-13 ==
== 2021-07-27 ==
* 23:06 mutante: releases* delete /usr/local/sbin/sync-* scripts created by rsync::quickdatacopy and let puppet recreate the ones still needed
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 22:27 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I80ca62643f5c}} (duration: 00m 58s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 20:12 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding (duration: 00m 29s)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 20:12 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 20:03 mutante: rsynced reprepro data from releases1001 to releases1002, releases2002
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 19:50 eileen: disable target smart job process-control config revision is {{Gerrit|b00e7680ca}}
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 19:48 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1] (duration: 00m 07s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 19:47 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1]
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:47 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1] (duration: 06m 41s)
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 19:41 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1]
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 19:33 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1a12124f1811e9a}} (duration: 00m 57s)
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T248343|T248343]] Don't use the 'zeroconf' configuration for VisualEditor (duration: 00m 55s)
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:43 dcausse: BACON done
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 18:40 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T257745|T257745]]: Add rollbacker to elwiki (duration: 00m 56s)
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 18:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T250810|T250810]]: Set proper language code for some wikis (duration: 00m 56s)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 18:18 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T256928|T256928]]: Scale largest shards to be closer to 30GB (duration: 00m 56s)
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 16:17 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 16:17 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 15:56 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:610265{{!}}Load WikibaseClient using extension registration in beta (T257435)]] (duration: 00m 55s)
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11882 and previous config saved to /var/cache/conftool/dbconfig/20200713-155240-marostegui.json
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11881 and previous config saved to /var/cache/conftool/dbconfig/20200713-154847-marostegui.json
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 15:39 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 15:35 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 15:30 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 14:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting DiscussionToolsEnableVisual, default value (duration: 00m 57s)
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 14:17 moritzm: removing lilypond from production [[phab:T257066|T257066]]
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11880 and previous config saved to /var/cache/conftool/dbconfig/20200713-133604-marostegui.json
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11879 and previous config saved to /var/cache/conftool/dbconfig/20200713-133535-marostegui.json
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'Fully repool es1022, and set es1020 to zero weight [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11878 and previous config saved to /var/cache/conftool/dbconfig/20200713-130532-kormat.json
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1022 after reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11873 and previous config saved to /var/cache/conftool/dbconfig/20200713-120818-kormat.json
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 11:49 Urbanecm: Password reset for User:Alert5 ([[phab:T257806|T257806]])
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 11:44 akosiaris: repool ganeti1007 [[phab:T244530|T244530]]. Start emptying ganeti1008
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 11:08 Urbanecm: EU B&C done
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|896c042296b4e1f5d88f786981537655e5d9fea9}}: Enable SandboxLink extension in trwiki ([[phab:T256782|T256782]]) (duration: 00m 56s)
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:612175{{!}} Bumping portals to master (612175)]] (duration: 00m 56s)
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:612175{{!}} Bumping portals to master (612175)]] (duration: 00m 56s)
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 09:42 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 08:58 ema: cp: rolling ats-backend-restart to apply SyslogIdentifier changes -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/611311
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 08:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T248343|T248343]] Explicitly set visualeditor-enable to 0 when non-default (duration: 00m 57s)
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1022 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11871 and previous config saved to /var/cache/conftool/dbconfig/20200713-084449-kormat.json
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1093', diff saved to https://phabricator.wikimedia.org/P11870 and previous config saved to /var/cache/conftool/dbconfig/20200713-083902-marostegui.json
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 08:34 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1022 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11869 and previous config saved to /var/cache/conftool/dbconfig/20200713-083414-kormat.json
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 08:20 kormat: reimaging es1022 [[phab:T257284|T257284]]
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 06:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 06:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 06:52 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 06:51 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 06:16 marostegui: Reverse gerrit password on m2 master - [[phab:T255715|T255715]]
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11868 and previous config saved to /var/cache/conftool/dbconfig/20200713-060410-marostegui.json
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11867 and previous config saved to /var/cache/conftool/dbconfig/20200713-055422-marostegui.json
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for upgrade', diff saved to https://phabricator.wikimedia.org/P11866 and previous config saved to /var/cache/conftool/dbconfig/20200713-054840-marostegui.json
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 05:34 marostegui: Deploy schema change on s3 codfw master, lag will appear on codfw [[phab:T253276|T253276]]
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 05:30 marostegui: Stop replication on db1082 for schema change and triggers removal [[phab:T238966|T238966]]
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11865 and previous config saved to /var/cache/conftool/dbconfig/20200713-052928-marostegui.json
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for innodb compression', diff saved to https://phabricator.wikimedia.org/P11864 and previous config saved to /var/cache/conftool/dbconfig/20200713-051428-marostegui.json
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:11 moritzm: installing aspell security updates
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support.  Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 11:23 Lucas_WMDE: EU backport+config window done
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-07-11 ==
== 2021-07-26 ==
* 19:16 qchris: Restarting Gerrit on gerrit1001 to switch to new gerrit.war and zuul plugin
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 19:16 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001 (duration: 00m 07s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 19:15 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 19:08 qchris: Restarting Gerrit on gerrit2001 to switch to new gerrit.war and zuul plugin
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 18:55 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001 (duration: 00m 10s)
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 18:55 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 06:39 moritzm: installing krb5 security updates
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki


== 2020-07-10 ==
== 2021-07-24 ==
* 21:52 ryankemper: Started long-running reindex of Elasticsearch indices in `eqiad`, `codfw`, and `dewiki` on `mwmaint1002` under tmux session `reindex` for user `ryankemper`
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 20:26 jgleeson: updated fundraising-tools from {{Gerrit|08ba1f6177}} to {{Gerrit|f8e424fe32}}
* 19:02 mutante: removing firewall hole for gerrit -> mysql servers on dbproxy servers for misc db's
* 18:44 mutante: kubernetes1004 - started nagios-nrpe-server
* 17:57 ebernhardson: change loginwiki password for Cindy-the-browser-test-bot, no email account was associated to allow for normal reset.
* 17:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I63fcea7737}} (duration: 00m 57s)
* 16:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN) (duration: 00m 08s)
* 15:56 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN)
* 15:44 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist (duration: 15m 17s)
* 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 15:29 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist
* 15:19 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 14:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 13:41 godog: bounce ms-be1037, not quite responsive
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11860 and previous config saved to /var/cache/conftool/dbconfig/20200710-123604-marostegui.json
* 12:20 reedy@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Score/: Make Score errors use a specific css class (duration: 00m 58s)
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'Finish repooling es1021, and remove weight from es1010 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11859 and previous config saved to /var/cache/conftool/dbconfig/20200710-102147-kormat.json
* 09:49 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1021 after reimage @ 50% [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11858 and previous config saved to /var/cache/conftool/dbconfig/20200710-094954-kormat.json
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11857 and previous config saved to /var/cache/conftool/dbconfig/20200710-085157-marostegui.json
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P11856 and previous config saved to /var/cache/conftool/dbconfig/20200710-085112-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107', diff saved to https://phabricator.wikimedia.org/P11855 and previous config saved to /var/cache/conftool/dbconfig/20200710-085040-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P11853 and previous config saved to /var/cache/conftool/dbconfig/20200710-082346-marostegui.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11852 and previous config saved to /var/cache/conftool/dbconfig/20200710-082329-marostegui.json
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11851 and previous config saved to /var/cache/conftool/dbconfig/20200710-080912-marostegui.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119', diff saved to https://phabricator.wikimedia.org/P11850 and previous config saved to /var/cache/conftool/dbconfig/20200710-080854-marostegui.json
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11849 and previous config saved to /var/cache/conftool/dbconfig/20200710-080843-kormat.json
* 08:01 kormat@cumin1001: dbctl commit (dc=all): 'Reset es2020/es2021 to correct weights after master switch [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11848 and previous config saved to /var/cache/conftool/dbconfig/20200710-080133-kormat.json
* 08:00 moritzm: installing cron security updates on jessie (stretch/buster already fixed)
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P11847 and previous config saved to /var/cache/conftool/dbconfig/20200710-075608-marostegui.json
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11846 and previous config saved to /var/cache/conftool/dbconfig/20200710-075500-marostegui.json
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079', diff saved to https://phabricator.wikimedia.org/P11845 and previous config saved to /var/cache/conftool/dbconfig/20200710-075431-marostegui.json
* 07:44 kormat: reimaging es1021 to buster [[phab:T257284|T257284]]
* 07:43 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1021 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11844 and previous config saved to /var/cache/conftool/dbconfig/20200710-074326-kormat.json
* 07:41 jbond@deploy1001: Finished deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors (duration: 00m 05s)
* 07:41 jbond@deploy1001: Started deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors
* 07:32 moritzm: installing e2fsprogs security updates on jessie (stretch/buster already fixed)
* 07:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:14 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 07:13 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11843 and previous config saved to /var/cache/conftool/dbconfig/20200710-065751-marostegui.json
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11841 and previous config saved to /var/cache/conftool/dbconfig/20200710-063818-marostegui.json
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134', diff saved to https://phabricator.wikimedia.org/P11840 and previous config saved to /var/cache/conftool/dbconfig/20200710-063746-marostegui.json
* 06:35 marostegui: Compress InnoDB on db1124:3311 (Sanitarium - lag will appear on s1 on labsdb) - [[phab:T254462|T254462]]
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P11839 and previous config saved to /var/cache/conftool/dbconfig/20200710-044428-marostegui.json
* 01:44 mutante: LDAP - adding coka to wmde and nda ([[phab:T257038|T257038]])
* 00:47 Reedy: truncated labswiki.interwiki table (outdated and unnecessary)


== 2020-07-09 ==
== 2021-07-23 ==
* 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2c2dea832}} (duration: 00m 56s)
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 21:52 tgr: all sessions have been invalidated due to [[phab:T256395|T256395]]
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 20:58 eileen: https://phabricator.wikimedia.org/T253152
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:16 herron: upgraded eqiad elk7 cluster from 7.4.2 to 7.8.0 [[phab:T234854|T234854]]
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:05 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 18:51 elukey: update spark2 to 2.4.4-bin-hadoop2.6-3 for buster-wikimedia
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:44 mutante: stat1004, stat1006, stat1007 - upgrading git-review package from 1.25 to 1.27 so that it keeps working with new Gerrit 3.2 ([[phab:T257609|T257609]])
* 16:15 effie: enable puppet on mc-gp* hosts
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f2557f848e99facaa62ca6b3a948cc3e32c32a3}}: Updating config for Readers Web affinity quicksurvey ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 17:42 chaomodus: codfw frack management dns automation deployment complete [[phab:T233183|T233183]]
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:36 James_F: Synchronized wmf-config/CommonSettings.php: ExtensionDistribution: Drop REL1_33, EOL'ed [[phab:T256087|T256087]]
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 17:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 17:35 moritzm: rebooting moscovium for kernel update
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 17:33 chaomodus: deploying frack codfw management dns automation
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 17:32 crusnov@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 17:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:28 crusnov@cumin2001: START - Cookbook sre.dns.netbox
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:27 moritzm: rebooting planet1002 (planet.wikimedia.org) for kernel update
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 17:10 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 04s)
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 05s)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 14:29 papaul: replacing msw-b1,b2,b3 and b4
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:03 moritzm: installing libtirpc security updates
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 13:45 moritzm: installing gnutls28 security updates
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P11831 and previous config saved to /var/cache/conftool/dbconfig/20200709-133134-marostegui.json
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 13:29 moritzm: rebooting puppetboard1001 (puppetboard.wikimedia.org) for kernel update
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 13:15 moritzm: installing ffmpeg security updates
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11830 and previous config saved to /var/cache/conftool/dbconfig/20200709-131039-marostegui.json
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 13:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 12:57 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 12:56 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 12:56 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 12:54 moritzm: rebooting install* servers for kernel security update
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]
* 12:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:38 moritzm: rebooting urldownloader1001/2001 for kernel update (failed over, these are now the inactive ones)
* 12:23 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:22 moritzm: rebooting dbmonitor1001 / tendril.wikimedia.org for kernek update
* 12:11 XioNoX: enable asw2-b-eqiad:ae3 (to cloudsw1-c8) - [[phab:T251632|T251632]]
* 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:50 moritzm: rebooting debmonitor1001 for kernel update
* 11:42 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Translate/tag/SpecialPageTranslation.php: {{Gerrit|6541d3ff51f52fe8a1bdbfa86022f8d97d6c7680}}: DeprecatablePropertyArray: Use MW_VERSION instead of array_key_exists ([[phab:T257531|T257531]]) (duration: 01m 05s)
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3a7c1c33e58637437f819edf039008a00dc5be27}}: Rename namespace on kn.wikipedia.org ([[phab:T255337|T255337]]) (duration: 01m 04s)
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a3c1f94a702b527842ed4f34d8bf41b26235e64}}: Add *.oireachtas.ie to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T256543|T256543]]) (duration: 01m 04s)
* 11:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6f442c6900524482806aeb1b5162e65bf7c97ac}}: Enable Quicksurveys for Desktop Improvements Project ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 11:01 vgutierrez: restart ats-tls on cp1085
* 10:55 _joe_: restarting php7.2-fpm on mw1282, workers failing with sigill
* 10:54 _joe_: depool mw1282
* 10:54 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:34 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:23 _joe_: rolling restart the remaining restbases in eqiad, and all of codfw
* 10:22 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:09 _joe_: restarting restbase on rb1020-22
* 09:53 _joe_: restarting restbase on restbase1024,1023
* 09:36 _joe_: restarting restbase on rb1026,1027 to switch to proton on k8s
* 09:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 09:28 _joe_: restarting restbase on restbase1025 to pick up the switch to k8s of proton
* 09:27 godog: bounce thanos-compact on thanos-fe2001
* 09:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P11828 and previous config saved to /var/cache/conftool/dbconfig/20200709-085228-marostegui.json
* 08:44 marostegui: Stop haproxy on dbproxy1017 before upgrading to buster - [[phab:T255408|T255408]]
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11827 and previous config saved to /var/cache/conftool/dbconfig/20200709-082355-marostegui.json
* 08:23 moritzm: imported osm2pgsql 0.96.0+ds-1~bpo9+1 to "main" component [[phab:T256877|T256877]]
* 08:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 08:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 08:13 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 08:11 XioNoX: disable igmp snooping on msw1-codfw
* 07:59 marostegui: Stop db1117:3322 to clone db1084, this will trigger haproxy alerts - [[phab:T257540|T257540]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11825 and previous config saved to /var/cache/conftool/dbconfig/20200709-075749-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11824 and previous config saved to /var/cache/conftool/dbconfig/20200709-053905-marostegui.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl', diff saved to https://phabricator.wikimedia.org/P11823 and previous config saved to /var/cache/conftool/dbconfig/20200709-053206-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11822 and previous config saved to /var/cache/conftool/dbconfig/20200709-051826-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11821 and previous config saved to /var/cache/conftool/dbconfig/20200709-051355-marostegui.json
* 05:11 marostegui: Remove revision triggers from db2093:3315 [[phab:T238966|T238966]]
* 05:10 marostegui: Deploy schema change on s5 codfw, lag will be generated - [[phab:T238966|T238966]]
* 01:43 tzatziki: reset email for GseSro
* 00:58 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'
* 00:49 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'


== 2020-07-08 ==
== 2021-07-22 ==
* 21:56 mutante: deleting files from releases2001 that are not existing on releases1001 to make them mirrors. rsync with --delete and the command from quickdatacopy class ([[phab:T247652|T247652]])
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 21:55 mutante: rsyncing releases files from releases1001 to releases2002 and releases1002. deleting files from releases2002 not existing on releases1002 to make them mirrors ( [[phab:T247652|T247652]]_
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 20:59 cstone: civicrm revision changed from {{Gerrit|d73ee2e73f}} to {{Gerrit|8b09c87ce2}},
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 20:27 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 20:08 Amir1_: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:18 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]] (duration: 01m 04s)
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|091442cf035a6d76f1211291afbb3193c513595d}}: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T256518|T256518]]) (duration: 01m 04s)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 18:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2e5943ddb30e08607a9ffb6ed05a042e8367e2e1}}: Add scan-bugs.org to $wgCopyUploadsDomains ([[phab:T256569|T256569]]) (duration: 01m 04s)
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f42cdf2}}: Change bnwiki logo ([[phab:T255328|T255328]]) (duration: 01m 04s)
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Cleanup: remove temporary wmgDisableHTCP variable gerrit:607596 [[phab:T250781|T250781]] IS.php (duration: 01m 01s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:20 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] CS.php (duration: 01m 03s)
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/wikitech.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] wikitech.php (duration: 01m 04s)
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:17 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] reverse-proxy.php (duration: 01m 04s)
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:11 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]], IS.php (duration: 01m 03s)
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:04 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]] (duration: 01m 04s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 17:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 17:16 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:57 _joe_: restarting restbase across the fleet to transition to using envoy
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 16:40 _joe_: restarting restbase on restbase2010 to route calls to mediawiki, parsoid via envoy
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 15:22 jgleeson: updated fundraising-tools from {{Gerrit|a244e0e85f}} --> {{Gerrit|f5b8528214}}
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 15:12 moritzm: rebooting people1002 (people.wikimedia.org) for kernel security update
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 14:46 moritzm: installing isc-dhcp security updates
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:31 moritzm: installing gdk-pixbuf security updates
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 14:26 _joe_: repooling mw1346
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:24 _joe_: php7adm /opcache-free on mw1346
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 14:15 jbond42: switch icinga authentication to CAS SSO
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 14:12 _joe_: depooling mw1346
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 14:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 14:04 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 14:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 14:04 moritzm: rebooting idp-test1001 for kernel update
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 13:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.stop-cluster (exit_code=97)
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 13:31 jynus: replacing ssh key for ci_docroot at deploy1001
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 13:31 moritzm: imported git 2.20.1-2+deb10u3~wmf1 for stretch-wikimedia component/git [[phab:T257308|T257308]]
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 13:00 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 12:41 marostegui: Deploy schema change on s7 codfw, lag is expected
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 12:17 xionox-tmp: rollout less frequent option-refresh-rate - [[phab:T240658|T240658]]
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 12:01 xionox-tmp: renumber eqiad NTT link - [[phab:T254877|T254877]]
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 11:42 awight: EU BACON complete
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 11:41 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610234{{!}}Undeploy graphoid for phase 1 wikis (T257402)]] (duration: 01m 03s)
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 11:31 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610268{{!}}Add nature.com to commonswiki wgCopyUploadDomains (T254342)]] (duration: 01m 03s)
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 11:29 moritzm: installing freetype security updates
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 11:26 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609991{{!}}[hiwikibooks] Translate sitename for hi.wikibooks (T256587)]] (duration: 01m 03s)
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609990{{!}}[arwiki] Grant 'patrolmarks' to all (T257106)]] (duration: 01m 04s)
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 11:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 11:18 moritzm: installing libgcrypt20 security updates
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 11:16 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 11:07 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 2/2 (duration: 01m 03s)
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 1/2 (duration: 01m 16s)
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:27 moritzm: installing libwebp security updates on stretch
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11818 and previous config saved to /var/cache/conftool/dbconfig/20200708-110546-marostegui.json
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 10:51 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 10:50 akosiaris: apply calico egress policies
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:50 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 10:45 moritzm: installing json-c security updates
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11817 and previous config saved to /var/cache/conftool/dbconfig/20200708-102553-marostegui.json
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11816 and previous config saved to /var/cache/conftool/dbconfig/20200708-102500-marostegui.json
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11815 and previous config saved to /var/cache/conftool/dbconfig/20200708-101313-marostegui.json
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 09:58 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 09:56 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 09:50 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 11:36 Lucas_WMDE: EU backport+config window done
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11814 and previous config saved to /var/cache/conftool/dbconfig/20200708-094539-marostegui.json
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11813 and previous config saved to /var/cache/conftool/dbconfig/20200708-092650-marostegui.json
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11812 and previous config saved to /var/cache/conftool/dbconfig/20200708-092627-marostegui.json
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 09:24 xionox-tmp: renumber eqord NTT link - [[phab:T254877|T254877]]
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 09:18 xionox-tmp: remove eqord-eqiad tunnel - [[phab:T254877|T254877]]
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11811 and previous config saved to /var/cache/conftool/dbconfig/20200708-091557-marostegui.json
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11810 and previous config saved to /var/cache/conftool/dbconfig/20200708-085745-marostegui.json
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 08:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11809 and previous config saved to /var/cache/conftool/dbconfig/20200708-085024-marostegui.json
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11808 and previous config saved to /var/cache/conftool/dbconfig/20200708-084227-marostegui.json
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 08:40 moritzm: upgrading docker on remaining buster hosts
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 08:38 hashar: Upgraded docker.io on contint1001 and contint2001
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:28 marostegui: Remove dbproxy1003 grants from misc hosts [[phab:T231280|T231280]]
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11807 and previous config saved to /var/cache/conftool/dbconfig/20200708-082624-marostegui.json
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11806 and previous config saved to /var/cache/conftool/dbconfig/20200708-082040-marostegui.json
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11805 and previous config saved to /var/cache/conftool/dbconfig/20200708-081647-marostegui.json
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2020 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11804 and previous config saved to /var/cache/conftool/dbconfig/20200708-081519-kormat.json
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 08:00 marostegui: Failover m1 from db1097 to db1080 - [[phab:T256717|T256717]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:57 kormat: reimaging es2020 to buster [[phab:T257284|T257284]]
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11803 and previous config saved to /var/cache/conftool/dbconfig/20200708-074939-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 07:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 07:48 jynus: stop bacula-director on backup1001 in preparation for m1 switchover [[phab:T256717|T256717]]
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:47 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:45 moritzm: installing PHP 7.3 security updates
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11802 and previous config saved to /var/cache/conftool/dbconfig/20200708-073548-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11801 and previous config saved to /var/cache/conftool/dbconfig/20200708-073037-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11800 and previous config saved to /var/cache/conftool/dbconfig/20200708-073011-marostegui.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11799 and previous config saved to /var/cache/conftool/dbconfig/20200708-072431-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11798 and previous config saved to /var/cache/conftool/dbconfig/20200708-070921-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11797 and previous config saved to /var/cache/conftool/dbconfig/20200708-070432-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11796 and previous config saved to /var/cache/conftool/dbconfig/20200708-070403-marostegui.json
* 06:47 marostegui: start topology changes on m1 [[phab:T256717|T256717]]
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11795 and previous config saved to /var/cache/conftool/dbconfig/20200708-064354-marostegui.json
* 06:36 marostegui: Deploy schema change on s2 primary master db1122 [[phab:T238966|T238966]]
* 06:18 _joe_: rolling restart of restbase to pick up the proton url change
* 03:36 andrew@deploy1001: Finished deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 (duration: 03m 44s)
* 03:32 andrew@deploy1001: Started deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130


== 2020-07-07 ==
== 2021-07-21 ==
* 22:41 mutante: new Wikimedia Annual Report 2019 now available on annual.wikimedia.org
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 21:29 andrew@deploy1001: Finished deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130 (duration: 03m 35s)
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 21:25 andrew@deploy1001: Started deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 21:10 andrew@deploy1001: Finished deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130 (duration: 03m 26s)
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 21:07 andrew@deploy1001: Started deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2 (duration: 09m 15s)
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 20:32 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009 (duration: 14m 28s)
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 20:24 mutante: kubernetes1003 - starting nagios-nrpe-server
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 20:23 mutante: kubernetes1001 - starting nagios-nrpe-server
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:17 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 20:27 dancy: testing upcoming Scap release on beta
* 19:27 mutante: destroying VM gerrit1002 - decom cookbook
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 19:04 mutante: contint2001 - move /var/lib/zuul/.ssh/known_hosts to root and run puppet to recreate it
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:38 andrew@deploy1001: Finished deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130 (duration: 03m 18s)
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:35 andrew@deploy1001: Started deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:27 andrew@deploy1001: Finished deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies (duration: 03m 26s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:23 andrew@deploy1001: Started deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:10 krinkle@deploy1001: Synchronized w/: remove untracked test cookie file (duration: 01m 04s)
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 18:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.40/includes/Revision/RevisionStore.php: {{Gerrit|I8f986daeab4}} (duration: 01m 05s)
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:59 herron: imported (logstash{{!}}kibana{{!}}elasticsearch)-oss-7.8.0 into buster-wikimedia thirdparty/elastic78
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 17:54 hnowlan: finished removing restbase2009 from cassandra pool
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 17:06 hnowlan: removed restbase2009-b from cassandra pool, removing restbase2009-c
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Wikibase: Backport: [[gerrit:610086{{!}}Revert "Don’t load $wgWBClientSettings in WikibaseClient.php" (T257296)]] (duration: 01m 10s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 15:49 hnowlan: running nodetool removenode for restbase2009-a
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 15:38 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 15:27 elukey: root-tmux on cumin1001 - cumin 'c:profile::mediawiki::mcrouter_wancache' '/usr/local/sbin/restart-mcrouter' -b 2 -s 5 - roll restart of mw-mcrouter to pick up new settings - [[phab:T255511|T255511]]
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:13 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:06 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:01 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:58 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus (duration: 00m 04s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:58 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:53 _joe_: restarted restbase on restbase2022 after removing restbase2009 from the cassandra seeds
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:48 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 14:47 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 14:30 papaul: replacing msw-a5,a6,a7 and a8
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 14:30 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 14:16 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: (no justification provided) (duration: 00m 09s)
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 14:16 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: (no justification provided)
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 13:38 _joe_: rolling restart of restbase to pick up using envoy
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 13:29 XioNoX: cr2-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 10:50 moritzm: installing systemd security updates on bullseye
* 13:24 XioNoX: cr1-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Promote es2021 to es4 master [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11789 and previous config saved to /var/cache/conftool/dbconfig/20200707-131524-kormat.json
* 10:14 effie: enable puppet on mw* servers
* 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 12:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 12:44 kormat: starting (codfw) es5 failover from es2020 to es2021 [[phab:T257284|T257284]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Set es2021 to weight 50 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11787 and previous config saved to /var/cache/conftool/dbconfig/20200707-123003-kormat.json
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:12 jforrester@deploy1001: Finished scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]] (duration: 33m 09s)
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 12:01 marostegui: Deploy schema change on labswiki (wikitech) master - [[phab:T253276|T253276]]
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11786 and previous config saved to /var/cache/conftool/dbconfig/20200707-115838-marostegui.json
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:39 jforrester@deploy1001: Started scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]]
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 11:38 jforrester@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "jforrester"; reason is "testwikis wikis to 1.35.0-wmf.40" (duration: 00m 00s)
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 11:33 moritzm: installing PHP 7.0 security updates
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 11:29 marostegui: Deploy schema change on db1082, this will create lag on s5 labs
* 08:17 effie: enable puppet on alert*
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11784 and previous config saved to /var/cache/conftool/dbconfig/20200707-112926-marostegui.json
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11783 and previous config saved to /var/cache/conftool/dbconfig/20200707-112830-marostegui.json
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 11:26 godog: test bumping logstash7 batch size to 256
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 11:17 moritzm: prune PHP 7.0 packages from mwdebug1001/2001/2002
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11782 and previous config saved to /var/cache/conftool/dbconfig/20200707-110506-marostegui.json
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11781 and previous config saved to /var/cache/conftool/dbconfig/20200707-110412-marostegui.json
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 10:57 moritzm: prune PHP 7.0 packages from mw2190-mw2214
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 10:46 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.40
* 07:16 godog: powercycle ms-be2048
* 10:44 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.38 (duration: 17m 23s)
* 07:03 moritzm: installing systemd security updates on stretch
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11780 and previous config saved to /var/cache/conftool/dbconfig/20200707-103255-marostegui.json
* 06:51 effie: restart memcached on eqiad mc* hosts
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11779 and previous config saved to /var/cache/conftool/dbconfig/20200707-102757-marostegui.json
* 06:51 effie: enable puppet on mc* hosts
* 10:26 moritzm: prune PHP 7.0 packages from mw2135-mw2147
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 10:12 addshore@deploy1001: Synchronized wmf-config/config/testcommonswiki.yaml: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT2/2 (duration: 00m 55s)
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:11 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 00s)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11778 and previous config saved to /var/cache/conftool/dbconfig/20200707-101043-marostegui.json
* 10:10 addshore@deploy1001: Synchronized dblists/wikidataclient-test.dblist: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 56s)
* 10:08 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 36s)
* 10:06 elukey: decommission archiva1001
* 10:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11777 and previous config saved to /var/cache/conftool/dbconfig/20200707-100328-marostegui.json
* 10:03 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11776 and previous config saved to /var/cache/conftool/dbconfig/20200707-095443-marostegui.json
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11775 and previous config saved to /var/cache/conftool/dbconfig/20200707-095428-marostegui.json
* 09:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609971]] [[phab:T257266|T257266]] Enable sitelinks to testcommons from test wikidata sites (duration: 00m 56s)
* 09:40 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2021 after reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11774 and previous config saved to /var/cache/conftool/dbconfig/20200707-094017-kormat.json
* 09:37 addshore@deploy1001: Synchronized wmf-config: [[gerrit:609986]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: Remove config option wmgUseEntitySourceBasedFederation (take2) (duration: 00m 57s)
* 09:36 _joe_: errata: restbase2010, not 2009
* 09:36 _joe_: applying the new configuration using the service proxy to restbase2009 too
* 09:34 godog: bounce logstash on logstash1023
* 09:33 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:609645]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: stop using wmgUseEntitySourceBasedFederation (take2) (duration: 00m 59s)
* 09:33 _joe_: depooling restbase1025 while we fix the troubled relationship between envoy and proton
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11773 and previous config saved to /var/cache/conftool/dbconfig/20200707-093345-marostegui.json
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1024 as it is the current master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11772 and previous config saved to /var/cache/conftool/dbconfig/20200707-092635-marostegui.json
* 09:24 James_F: 1.35.0-wmf.40 was branched at {{Gerrit|88ecd6df00a46e432c06c1cf40d5098128abc4d8}} for [[phab:T256668|T256668]]
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11771 and previous config saved to /var/cache/conftool/dbconfig/20200707-092357-marostegui.json
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11770 and previous config saved to /var/cache/conftool/dbconfig/20200707-091015-marostegui.json
* 08:33 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11769 and previous config saved to /var/cache/conftool/dbconfig/20200707-083144-marostegui.json
* 08:30 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 XioNoX: cr2-codfw> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 08:22 XioNoX: cr2-eqsin> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:19 XioNoX: cr2-eqord> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11768 and previous config saved to /var/cache/conftool/dbconfig/20200707-081909-marostegui.json
* 08:18 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.change-distro (exit_code=97)
* 08:17 XioNoX: cr2-eqdfw> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:15 XioNoX: cr3-knams> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:15 hashar: upgrading and restart CI Jenkins on contint2001 # [[phab:T256978|T256978]]
* 08:12 XioNoX: cr4-ulsfo> request vmhost snapshot - [[phab:T257153|T257153]]
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11767 and previous config saved to /var/cache/conftool/dbconfig/20200707-080914-kormat.json
* 07:50 marostegui: Stop MySQL on db1074 to deploy schema change and remove triggers - [[phab:T238966|T238966]]
* 07:45 _joe_: restarting restbase again on rb1025
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P11766 and previous config saved to /var/cache/conftool/dbconfig/20200707-074435-marostegui.json
* 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 and db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11765 and previous config saved to /var/cache/conftool/dbconfig/20200707-073918-marostegui.json
* 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 _joe_: restarting restbase on restbase1025, reaching proton via envoy for now
* 07:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (forgot to git rebase so the last sync was a no-op) (duration: 00m 56s)
* 07:27 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 07:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (duration: 00m 53s)
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11764 and previous config saved to /var/cache/conftool/dbconfig/20200707-072703-marostegui.json
* 07:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:23 lucaswerkmeister-wmde@deploy1001: Synchronized dblists/wikidataclient.dblist: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:609642{{!}}Revert "Wikibase: stop using wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 55s)
* 07:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609641{{!}}Revert "Wikibase: Remove config option wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 57s)
* 07:10 _joe_: restart restbase on restbase1025 to pick up the switch to https for cxserver
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11762 and previous config saved to /var/cache/conftool/dbconfig/20200707-063737-marostegui.json
* 06:29 marostegui: Reimage es1023 to Buster [[phab:T255755|T255755]]
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1136 some weight back into main traffic [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11761 and previous config saved to /var/cache/conftool/dbconfig/20200707-062008-marostegui.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11760 and previous config saved to /var/cache/conftool/dbconfig/20200707-061849-marostegui.json
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Enable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 entirely [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11759 and previous config saved to /var/cache/conftool/dbconfig/20200707-051620-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11758 and previous config saved to /var/cache/conftool/dbconfig/20200707-051236-marostegui.json
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:01 marostegui: "Starting es failover from es1023 to es1024 - https://phabricator.wikimedia.org/T255755"
* 01:05 ejegg: turned on debug logging for Adyen SmashPig
* 00:22 cstone: civicrm revision changed from {{Gerrit|a48caf0f37}} to {{Gerrit|d73ee2e73f}}


== 2020-07-06 ==
== 2021-07-20 ==
* 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable sidebar instrumentation on test wikipedia (duration: 00m 56s)
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 23:32 eileen: process-control config revision is {{Gerrit|3fe6753e56}}
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 23:22 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change some zh canonical namespaces. Don
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 17:06 rzl: enabled puppet on A:mw
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector:


== 2020-07-05 ==
== 2021-07-06 ==
* 21:50 qchris: Restarting gerrit on gerrit1001 to pick up new war and jars.
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 21:50 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 (duration: 00m 07s)
* 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:50 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:46 qchris: Restarting gerrit on gerrit2001 to pick up new war and jars.
* 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 21:45 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 (duration: 00m 10s)
* 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
* 21:45 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001
* 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
* 21:32 qchris: Restarting gerrit on gerrit1002 to pick up new wars and jars.
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
* 21:32 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 (duration: 00m 08s)
* 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
* 21:32 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67
* 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
* 21:20 qchris: Enable puppet on gerrit1002 (gerrit-test) again to let it catch up again
* 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
* 16:01 gehel: restart elastic-psi on elastic1052 (high GC rate)
* 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
* 15:56 gehel: restart blazegraph + updater on wdqs1007 and depool to allow catching up on lag
* 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
* 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
* 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
* 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
* 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
* 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
* 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
* 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
* 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
* 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
* 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
* 10:19 moritzm: installing jackson-databind security updates on buster
* 09:01 _joe_: repooling wdqs1007 now that lag has caught up
* 08:43 moritzm: installing libuv1 security updates on buster
* 07:06 marostegui: Upgrade db1104 kernel
* 06:54 moritzm: installing PHP 7.3 securiy updates on buster
* 06:50 marostegui: Upgrade db1122 kernel
* 06:35 marostegui: Upgrade db1138 kernel
* 06:31 marostegui: Upgrade db1160 kernel
* 00:56 eileen: process-control config revision is {{Gerrit|8d46b52ed4}}


== 2020-07-04 ==
== 2021-07-05 ==
* 19:23 qchris@deploy1001: Finished deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002 (duration: 00m 08s)
* 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image ([[phab:T286212|T286212]])
* 19:23 qchris@deploy1001: Started deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002
* 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
* 14:05 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather feedback
* 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
* 12:42 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
* 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 [[phab:T286042|T286042]]
* 02:28 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/includes/Score.php: Short circuit lilypond version check to allow usage of cached files [[phab:T257066|T257066]] (duration: 00m 55s)
* 11:29 moritzm: installing openexr security updates on stretch
* 11:07 moritzm: installing tiff security updates on stretch
* 10:48 moritzm: upgrading PHP on miscweb*
* 10:37 jbond: enable puppet  fleet wide to post puppetdb change
* 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication [[phab:T286102|T286102]]
* 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
* 08:15 moritzm: rolling out debmonitor-client 0.3.0
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:04 _joe_: restarting blazegraph, then restarting the updater again
* 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
* 06:47 _joe_: restart wdqs-updater on wdqs1007
* 00:53 eileen: process-control config revision is {{Gerrit|a1717c7fde}}
* 00:47 eileen: process-control config revision is {{Gerrit|24565578f7}}


== 2020-07-03 ==
== 2021-07-04 ==
* 21:49 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/: Sync maintenance script (duration: 00m 58s)
* 17:43 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957{{!}}Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
* 18:47 cdanis: ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕒☕ sudo systemctl restart hive-server2.service
* 08:02 elukey: repool eqsin after equinix maintenance - [[phab:T286113|T286113]]
* 16:51 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ifa929b2ad4}} (duration: 00m 57s)
* 16:02 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Rename wgRestrictionMethod to wgShellRestrictionMethod (duration: 00m 58s)
* 15:46 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight to spread load mode evenly', diff saved to https://phabricator.wikimedia.org/P11730 and previous config saved to /var/cache/conftool/dbconfig/20200703-154337-jynus.json
* 15:40 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:38 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 15:02 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99)
* 14:11 _joe_: restarted php-fpm on wtp1033, stuck in sigill
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 12:41 hashar: Restarting Zuul / CI
* 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:29 moritzm: rebooting urldownloader standby hosts for kernel updates (1002/2002)
* 10:59 moritzm: installing json-c security updates on jessie
* 10:51 moritzm: installing ruby-json security updates
* 10:25 moritzm: installing nss security updates on jessie
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:15 elukey: notebook1004 renamed to an-scheduler1001
* 10:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:43 moritzm: rebooting netflow* hosts for kernel security update
* 08:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:04 jayme: authdns-update for chartmuseum - [[phab:T256970|T256970]]
* 08:03 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:55 moritzm: installing mutt security updates for jessie (stretch/buster already fixed)
* 07:44 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:47 moritzm: installing php5 security updates
* 06:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:09 moritzm: rebooting mw1390-mw1419 for kernel security updates
* 05:46 XioNoX: remove chassis redundancy failover from fasw-c-eqiad for consistency with all other VCs
* 05:33 XioNoX: remove chassis redundancy failover from fasw-c-codfw for consistency with all other VCs


== 2020-07-02 ==
== 2021-07-03 ==
* 23:22 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - [[phab:T286113|T286113]]
* 23:16 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for [[phab:T283659|T283659]]
* 22:03 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 08:53 Amir1: patching postorius and mailmanclient on lists1001 for [[phab:T283659|T283659]]
* 21:56 mutante: gerrit1001 (prod gerrit) - restarting gerrit service
* 21:52 maryum: frwikibooks reindex sucessful, continuing on with remainder of french wikis
* 21:32 mutante: gerrit - deleted gerrit db_pass from prod private repo, running puppet
* 21:25 mutante: gerrit2001 - restarted gerrit
* 21:14 mutante: gerrit1002 restarted gerrit
* 20:20 maryum: reindexing frwikibooks to test https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/604221
* 19:52 mutante: gerrit2001 - restarting gerrit after removing db_pass from config
* 16:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:42 moritzm: rebooting mw1370-mw1389 for kernel security updates
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:03 kormat: stopped mariadb@s8 on dbstore1005 for data restoration [[phab:T256966|T256966]]
* 12:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:31 moritzm: rebooting mw1349-mw1369 for kernel security updates
* 12:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 12:27 vgutierrez: rolling restart of esams load balancers to catch up on kernel upgrades
* 12:12 XioNoX: pre-configure asw2-b-eqiad<->cloudsw1-c8-eqiad - [[phab:T251632|T251632]]
* 12:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 vgutierrez: rolling restart of codfw load balancers to catch up on kernel upgrades
* 11:18 akosiaris: preactively restart docker-registry on registry1001, registry1002 to force CA refresh
* 11:16 akosiaris: restart docker-registry on registry2002 for CA refresh
* 11:14 _joe_: restarting docker-registry on registry2001
* 10:34 godog: move "cluster overview" dashboard to Thanos - [[phab:T256954|T256954]]
* 09:35 XioNoX: advertise codfw prefixes from eqord
* 09:28 jayme: imported chartmuseum_0.12.0-2 to buster-wikimedia - [[phab:T253843|T253843]]
* 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "DCausse_(WMF)" # [[phab:T256949|T256949]]
* 09:07 addshore: addshore@mwmaint1002:~$ mwscript maintenance/createAndPromote.php --wiki testwikidatawiki --force --custom-groups oversight "Addshore" # [[phab:T256949|T256949]]
* 08:59 XioNoX: deploy flex flow for MX204s - [[phab:T248394|T248394]]
* 05:52 _joe_: removing all tags for envoy-tls-local-proxy
* 05:46 _joe_: upload docker-report 0.0.4 on buster-wikimedia [[phab:T242604|T242604]]
* 04:32 eileen: process-control config revision is {{Gerrit|b4655897b5}}
* 03:17 eileen: process-control config revision is {{Gerrit|12fe6b5151}}
* 03:15 eileen: tools revision changed from {{Gerrit|4ea8567819}} to {{Gerrit|e974147f27}}
* 02:32 eileen: tools revision changed from {{Gerrit|e38f7a83d4}} to {{Gerrit|4ea8567819}}
* 00:53 eileen: tools revision changed from {{Gerrit|806e2b4412}} to {{Gerrit|e38f7a83d4}}


== 2020-07-01 ==
== 2021-07-02 ==
* 23:53 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Set $wgForceUIAsContentMsg for zhwikibooks, zhwikinews, zhwikiquote, zhwikisource, zhwikiversity, zhwiktionary ([[phab:T256521|T256521]]) (duration: 00m 55s)
* 22:06 foks: removing three files for legal compliance
* 23:35 ejegg: updated fundraising CiviCRM from {{Gerrit|391d0fdf75}} to {{Gerrit|a48caf0f37}}
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:32 catrope@deploy1001: Synchronized static/images/project-logos/: Change Simplified Chinese logo for zhwiki ([[phab:T256839|T256839]]) (duration: 00m 55s)
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:18 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ibb42db7fd1ee}} (duration: 00m 55s)
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:00 bstorm: set a short downtime on labstore1006/7 to prevent alert while disabling direct systemd monitoring
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 22:37 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/includes/Title.php: {{Gerrit|I8d5bad9c654c4ab}} (duration: 01m 00s)
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:00 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:58 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:56 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:56 Krinkle: krinkle@deploy1001 Ran `scap deploy --init` for /srv/deployment/performance/arc-lamp
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:55 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@d7476f5]: Update mobileapps to {{Gerrit|953fc41a}} (duration: 04m 08s)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:51 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@d7476f5]: Update mobileapps to {{Gerrit|953fc41a}}
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:27 eileen: tools revision changed from {{Gerrit|6f38c14fe3}} to {{Gerrit|806e2b4412}} -
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 20:11 eileen: tools revision changed from {{Gerrit|aab96444df}} to {{Gerrit|6f38c14fe3}}
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:23 twentyafterfour: 1.35.0-wmf.39 is now deployed to group2 wikis, everything appears to be normal. refs [[phab:T254176|T254176]]
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.35.0-wmf.39  refs [[phab:T254176|T254176]]
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 18:44 addshore@deploy1001: Synchronized wmf-config: REVERT [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration [[gerrit:569259]] (duration: 01m 04s)
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 18:41 addshore@deploy1001: sync-file aborted: [[phab:T254315|T254315]] Wikidata client wikis: Define entity sources configuration [[gerrit:569259]] (duration: 00m 38s)
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 18:38 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf] (duration: 02m 19s)
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 18:36 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf] (thin): Regular analytics weekly train THIN [analytics/refinery@8b7bddf]
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 18:35 joal@deploy1001: Finished deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf] (duration: 08m 09s)
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 18:27 joal@deploy1001: Started deploy [analytics/refinery@8b7bddf]: Regular analytics weekly train [analytics/refinery@8b7bddf]
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 18:25 joal@deploy1001: Finished deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed] (duration: 03m 41s)
* 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
* 18:21 joal@deploy1001: Started deploy [analytics/refinery@114bfed]: Regular analytics weekly train [analytics/refinery@114bfed]
* 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable kafka purges on wikitech gerrit:607590 IS-labs.php (duration: 01m 03s)
* 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
* 18:07 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Deploy MediaModeration on all production wikis gerrit:608753 (duration: 01m 07s)
* 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
* 17:14 XioNoX: set flex-flow-sizing to cr2-eqsin - [[phab:T248394|T248394]]
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
* 16:57 XioNoX: restart cr2-eqsin for software upgrade - [[phab:T243080|T243080]]
* 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
* 16:00 XioNoX: updating eqsin LVS BGP neighbors IPs - [[phab:T255766|T255766]]
* 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
* 15:16 XioNoX: re0.cr1-eqsin> request system power-off both-routing-engines - [[phab:T255766|T255766]]
* 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
* 15:15 XioNoX: disable BGP to pybal on cr1-eqsin - [[phab:T255766|T255766]]
* 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
* 15:13 XioNoX: disable cr1-eqsin transit/peering BGP - [[phab:T255766|T255766]]
* 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
* 15:09 XioNoX: bump eqsin-codfw ospf link cost - [[phab:T255766|T255766]]
* 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
* 15:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:03 XioNoX: move vrrp master to cr2-eqsin - [[phab:T255766|T255766]]
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 15:00 XioNoX: depool eqsin for routers work - [[phab:T255766|T255766]]
* 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 14:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:11 mutante: mw2380 - rebooting
* 14:04 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 14:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
* 13:37 hashar: contint1001 stopped zuul-merger for a test. started it again
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 13:35 hashar: Restarting zuul-merger on contint2001 # [[phab:T252310|T252310]]
* 12:24 moritzm: added btullis to pwstore
* 13:30 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 08s)
* 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run [[phab:T285603|T285603]]
* 13:30 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
* 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
* 13:29 hashar@deploy1001: Finished deploy [zuul/deploy@00f69b3]: (no justification provided) (duration: 00m 32s)
* 11:28 mutante: powercycling mw2380, trying to make it boot
* 13:28 hashar@deploy1001: Started deploy [zuul/deploy@00f69b3]: (no justification provided)
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:16 hashar@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.39 (duration: 01m 04s)
* 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:15 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.39
* 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 13:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix {{Gerrit|3fd2873}} for [[phab:T285579{{!}}T285579]] (duration: 00m 59s)
* 13:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
* 13:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: [[gerrit:702808{{!}}Fix handling of geEnabled flag (T285996)]] (duration: 00m 57s)
* 13:08 cdanis: ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕘☕ sudo apt remove valgrind libc6-dbg
* 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
* 13:03 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘☕ sudo cumin 'netflow[3-5]001*' 'systemctl restart nfacctd'
* 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - [[phab:T285835|T285835]]
* 12:58 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘☕ sudo debdeploy deploy -u 2020-07-01-pmacct.yaml -s netflow
* 09:19 dcausse: restart blazegraph on wdqs1013
* 12:55 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@apt1001.wikimedia.org ~ 🕘☕ sudo -E reprepro -C main include buster-wikimedia pmacct_1.7.2-3+wmf1_amd64.changes
* 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
* 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 mutante: decom'ing mw1267
* 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:02 moritzm: installing node-hosted-git-info security updates
* 11:47 ema: A:cp upgrade librdkafka1 to 0.11.6-1.1wmf1 and restart purged, varnishkafka [[phab:T256444|T256444]]
* 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
* 11:46 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254315|T254315]] Wikidata: Define entity sources configuration [[gerrit:569258]] (duration: 01m 06s)
* 08:54 moritzm: installing  golang-docker-credential-helpers security updates
* 11:32 Lucas_WMDE: EU B&C window done
* 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
* 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized w/touch.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 4/4 (duration: 01m 06s)
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:22 lucaswerkmeister-wmde@deploy1001: Synchronized w/robots.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 3/4 (duration: 01m 03s)
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:21 lucaswerkmeister-wmde@deploy1001: Synchronized w/favicon.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 2/4 (duration: 01m 04s)
* 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:19 lucaswerkmeister-wmde@deploy1001: Synchronized w/extract2.php: Config: [[gerrit:608713{{!}}Fully set MW_NO_SESSION for browser metadata endpoints]], 1/4 (duration: 01m 16s)
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
* 11:07 Amir1: Changing datatype of several properties with mwscript extensions/Wikibase/repo/maintenance/changePropertyDataType.php ([[phab:T255241|T255241]])
* 08:03 moritzm: installing ipmitool security updates
* 11:07 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
* 11:02 ema: restbase2009 depooled [[phab:T256863|T256863]]
* 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
* 11:02 ema@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2009.codfw.wmnet
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
* 10:50 ema: power on restbase2009
* 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
* 10:45 jayme: draining and docker restart (one at a time) kubernetes[1001-1004].eqiad.wmnet - [[phab:T256786|T256786]]
* 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
* 10:34 ema: power-cycle restbase2009
* 03:14 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
* 10:17 XioNoX: renumber NTT transit links - [[phab:T254877|T254877]]
* 03:11 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
* 10:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:07 ryankemper: [[phab:T264053|T264053]] `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
* 10:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 03:07 ryankemper: [[phab:T264053|T264053]] `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
* 10:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:06 ryankemper: [[phab:T264053|T264053]] Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
* 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 03:05 ryankemper: [[phab:T264053|T264053]] `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - [[phab:T264053|T264053]]"'`
* 10:09 jayme: draining and docker restart (one at a time) kubernetes[2001-2004].codfw.wmnet
* 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
* 09:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 01:47 eileen: civicrm revision changed from {{Gerrit|e07c2be1a7}} to {{Gerrit|bb62188ec6}}, config revision is {{Gerrit|1739c53fcb}}
* 09:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o ([[phab:T264053|T264053]])
* 09:46 jayme: cordoning kubernetes[2001-2004].codfw.wmnet,kubernetes[1001-1004].eqiad.wmnet - [[phab:T256786|T256786]]
* 09:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:23 jayme: restarting dockerd on kubestage1002.eqiad.wmnet - [[phab:T256786|T256786]]
* 09:15 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:53 jayme: draining kubernetes staging node kubestage1001.eqiad.wmnet - [[phab:T256786|T256786]]
* 08:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:29 XioNoX: disable BGP to nfacct in eqiad - [[phab:T256790|T256790]]
* 08:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:08 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 08:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:01 vgutierrez: rolling restart of esams cache nodes to catch up on kernel upgrades
* 07:42 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:42 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 07:39 ema: cp2041: restart purged, varnishkafka after librdkafka1 upgrade to 0.11.6-1.1wmf1 [[phab:T256444|T256444]]
* 05:47 _joe_: restarting nfacctd on netflow1001, it's segfaulting
* 04:01 krinkle@deploy1001: Synchronized php-1.35.0-wmf.39/maintenance/findBadBlobs.php: {{Gerrit|I47c11190b665}} (duration: 01m 08s)
* 00:14 krinkle@deploy1001: Synchronized private/PrivateSettings.php: [[phab:T254795|T254795]] - Set $wmgXhguiDBuser and $wmgXhguiDBpasswor (duration: 01m 06s)


== 2020-06-30 ==
== 2021-07-01 ==
* 21:48 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 23:29 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702777{{!}}Revert "deployment training: readme whitespace"]] (duration: 00m 56s)
* 21:46 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:21 thcipriani@deploy1002: Synchronized README: Config: [[gerrit:702774{{!}}deployment training: readme whitespace]] (duration: 00m 57s)
* 21:45 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:37 urbanecm: Start server-side upload for 1 video file ([[phab:T285182|T285182]])
* 21:43 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:36 urbanecm: Start server-side upload for 1 video file ([[phab:T285789|T285789]])
* 21:42 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:31 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:702704{{!}}Use train-versions.json to map from version to image tag (T282824)]] (duration: 00m 57s)
* 21:40 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:27 urbanecm: Start server-side upload for 1 video file ([[phab:T285682|T285682]])
* 21:40 crusnov@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: [[gerrit:702755{{!}}Temporarily disable notification for security patch failures]] (duration: 00m 57s)
* 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
* 21:38 crusnov@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
* 21:38 crusnov@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 19:19 hashar@deploy1001: rebuilt and synchronized wikiversions files: group 0 wikis to 1.35.0-wmf.39 # [[phab:T254176|T254176]]
* 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 18:31 cdanis: [[phab:T256790|T256790]] ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕝☕ sudo apt install valgrind
* 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: [[gerrit:702711{{!}}Consistently normalize Title::mFragment before setting (T285951)]] (duration: 01m 10s)
* 18:27 tgr: Morning deploys done
* 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: [[gerrit:702168{{!}}Trigger update-train-versions job at end of wmf-publish pipeline]] (duration: 01m 08s)
* 18:23 tgr@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/ElectronPdfService/src/ElectronPdfServiceHooks.php: Backport: [[gerrit:608485{{!}}Hotfix: "Undefined index: print" (T256761)]] (duration: 01m 05s)
* 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
* 18:11 shdubsh: restart varnishmtail,atsmtail,ncredirmtail on ncredir,cp hosts in codfw and eqsin
* 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
* 18:05 cdanis: installing libc6-dbg on netflow2001 [[phab:T256790|T256790]]
* 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7995f7abe3b94eb0326064cbbd1d3111f8f21365}}: Use Vue.js for QuickSurveys on available wikis ([[phab:T285890|T285890]]) (duration: 01m 09s)
* 17:40 mdholloway: mobileapps deployments on k8s failing with timeouts; filed [[phab:T256786|T256786]]
* 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|654877f92fa18ae766d693630025c69400cad3ac}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 12s)
* 17:37 cdanis: ✔️ cdanis@netflow2001.codfw.wmnet ~ 🕜☕ sudo systemctl restart nfacctd
* 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: {{Gerrit|6d9043087ec421e1321cd6797934928e2651b1c1}}: EventDispatcher: Ensure we fetch page content from the primary database ([[phab:T285895|T285895]]) (duration: 01m 14s)
* 17:33 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:18 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
* 17:17 papaul: uplugging msw-c3 power to relocate port on PDU
* 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@f9df1af]: Update mobileapps to {{Gerrit|5c7611b9}} (duration: 03m 33s)
* 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: [[phab:T285959|T285959]] (duration: 01m 20s)
* 17:05 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@f9df1af]: Update mobileapps to {{Gerrit|5c7611b9}}
* 16:11 vgutierrez: restart varnish-fe on cp3059 - [[phab:T285953|T285953]]
* 16:57 cdanis: [[phab:T256444|T256444]] restarted purged on cp2030 and repooling
* 14:58 papaul: poweroff mw2380 for disk replacement
* 16:48 cdanis: [[phab:T256444|T256444]] ✔️ cdanis@cp2030.codfw.wmnet ~ 🕐☕ sudo depool
* 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 15:54 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3 (duration: 00m 03s)
* 14:53 effie: depool mw2380 for disk repair - [[phab:T285603|T285603]]
* 15:54 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:45 moritzm: installing glib2.0 security updates on buster
* 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
* 15:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
* 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:03 marostegui: Deploy schema change on s2 eqiad master [[phab:T276150|T276150]]
* 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
* 15:16 otto@deploy1001: Finished deploy [analytics/refinery@1112749]: roll back to {{Gerrit|1112749}} on an-launcher1002, git-fat not pulling artifacts (duration: 01m 21s)
* 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
* 15:14 otto@deploy1001: Started deploy [analytics/refinery@1112749]: roll back to {{Gerrit|1112749}} on an-launcher1002, git-fat not pulling artifacts
* 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:14 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:23 tgr: EU deploys done
* 15:10 moritzm: rebooting mwdebug* hosts for kernel security update
* 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: [[gerrit:702402{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702404{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 08s)
* 15:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: [[gerrit:702401{{!}}Welcome tour: Mark as complete when notice is shown (T284800)]] [[gerrit:702403{{!}}SuggestedEdits: Return default JS data as 'noresults' (T285906)]] (duration: 01m 09s)
* 15:03 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
* 15:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
* 14:59 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
* 14:59 moritzm: rebooting failoid hosts for kernel update
* 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: [[gerrit:702400{{!}}Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472)]] (duration: 01m 15s)
* 14:49 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3 (duration: 00m 03s)
* 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
* 14:49 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 3
* 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 14:47 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 2 (duration: 00m 03s)
* 11:35 marostegui: Deploy schema change on s8 eqiad master [[phab:T276150|T276150]]
* 14:47 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] - take 2
* 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
* 14:44 hashar: Train blocked on Flow being broken: [[phab:T256761|T256761]]   # [[phab:T254176|T254176]]
* 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
* 14:38 hashar@deploy1001: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.35.0-wmf.39" - [[phab:T256759|T256759]]
* 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:697851{{!}}Avoid using MWNamespace]] (duration: 01m 06s)
* 14:32 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:28 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:26 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:05 moritzm: installing remaining libgcrypt20 security updates
* 14:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.39
* 09:56 moritzm: installing remaining gnutls28 security updates
* 14:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:55 Amir1: start of clean up of autoreview logs in ruwiki ([[phab:T285608|T285608]])
* 14:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:21 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:15 moritzm: rebooting miscweb servers for kernel security update
* 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master [[phab:T277123|T277123]]
* 14:13 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master [[phab:T277123|T277123]]
* 14:11 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
* 14:10 otto@deploy1001: Finished deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]] (duration: 01m 56s)
* 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
* 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
* 14:09 hashar@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.39 (duration: 62m 30s)
* 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
* 14:08 otto@deploy1001: Started deploy [analytics/refinery@d63944e]: Deploying new camus wmf10 jar to an-launcher1002 for [[phab:T256370|T256370]]
* 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
* 14:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
* 13:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master [[phab:T277123|T277123]]
* 13:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters [[phab:T277123|T277123]]
* 13:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters [[phab:T277123|T277123]]
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) [[phab:T277123|T277123]]
* 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) [[phab:T277123|T277123]]
* 13:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
* 13:37 moritzm: rebooting LDAP replicas for kernel security update
* 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
* 13:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8
* 13:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 13:07 hashar@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.39
* 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 11:33 awight: EU BACON cooked
* 11:32 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:608478{{!}}Configure TeWü survey on dewiki (take 2) (T253112)]] (duration: 00m 58s)
* 11:32 jayme: restarted docker-reporter-base-images and docker-reporter-releng-images on deneb - [[phab:T253396|T253396]]
* 11:31 jayme: pushed a scratch docker image as docker-registry.discovery.wmnet/envoy-tls-local-proxy:dontuseme - [[phab:T253396|T253396]]
* 11:28 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/QuickSurveys: BACON: [[gerrit:608477{{!}}Embedded surveys are hidden when no element is available (T256627)]] (duration: 00m 56s)
* 11:26 awight@deploy1001: Synchronized php-1.35.0-wmf.38/extensions/FileImporter: BACON: [[gerrit:608476{{!}}Set Status error if permission check returns false. (T256428)]] (duration: 00m 58s)
* 11:13 ema: deneb: systemctl restart docker-reporter-base-images.service
* 10:59 ema: upload librdkafka 0.11.6-1.1wmf1 to buster-wikimedia https://phabricator.wikimedia.org/P11703 [[phab:T256444|T256444]]
* 10:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076', diff saved to https://phabricator.wikimedia.org/P11710 and previous config saved to /var/cache/conftool/dbconfig/20200630-105254-marostegui.json
* 10:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:41 ema: cp2040: restart purged and varnishkafka to use updated librdkafka1 [[phab:T256444|T256444]]
* 10:38 ema: cp2040: upgrade librdkafka1 to 0.11.6-1.1wmf1 https://phabricator.wikimedia.org/P11703 [[phab:T256444|T256444]]
* 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:30 hashar@deploy1001: Synchronized php-1.35.0-wmf.39/includes/specials/SpecialUndelete.php: Remove another use of PageArchive::getRevision - [[phab:T249982|T249982]] [[phab:T254176|T254176]] (duration: 00m 56s)
* 10:09 marostegui: Deploy schema change on db1076
* 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076', diff saved to https://phabricator.wikimedia.org/P11708 and previous config saved to /var/cache/conftool/dbconfig/20200630-100912-marostegui.json
* 10:04 vgutierrez: rolling restart of eqiad cache nodes to catch up on kernel upgrades
* 10:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 10:02 volker-e@deploy1001: Finished deploy [design/style-guide@e3fda83]: Deploy design/style-guide:  (duration: 00m 07s)
* 10:02 volker-e@deploy1001: Started deploy [design/style-guide@e3fda83]: Deploy design/style-guide:
* 09:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.37 (duration: 02m 20s)
* 09:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 09:21 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 28m 11s)
* 08:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
* 08:53 hashar@deploy1001: clean aborted: Pruned MediaWiki: 1.35.0-wmf.36 (duration: 00m 00s)
* 08:51 hashar: Applied security patches to wmf/1.35.0-wmf.39 # [[phab:T254176|T254176]]
* 08:51 vgutierrez: rolling restart of codfw cp nodes after "re-formatting" nvme devices - [[phab:T256655|T256655]]
* 08:23 vgutierrez: repool cp3053 - [[phab:T256632|T256632]]
* 08:10 hashar: 1.35.0-wmf.39 was branched at {{Gerrit|e169e3dabcb2217809fc41ba44b43a39ae1a678e}} [[phab:T254176|T254176]]
* 08:05 marostegui: Stop MySQL on db1117:3322 to clone db1080 (this will trigger haproxy alerts) - [[phab:T256717|T256717]]
* 08:05 vgutierrez: powercycle cp3053 (unresponsive after reboot) - [[phab:T256632|T256632]]
* 08:01 jbond42: disable puppet to restart puppetmasters front ends
* 07:42 vgutierrez: reboot cp3053 - [[phab:T256632|T256632]]
* 05:51 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 05:13 marostegui: Deploy schema change on s8 codfw - [[phab:T256680|T256680]]
* 04:58 marostegui: remove pl_from index from db1141, db1121, db1148 - [[phab:T256684|T256684]]
* 04:57 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 04:56 marostegui: Remove plfrom from db1096:3316 and db1098:3316 - [[phab:T256684|T256684]]


== 2020-06-29 ==
== 2021-06-30 ==
* 23:28 eileen: civicrm revision changed from {{Gerrit|52a32f2d66}} to {{Gerrit|391d0fdf75}}, config revision is {{Gerrit|f1b4bdb7b7}}
* 23:28 urbanecm: Evening B&C window finished
* 22:00 sbassett: Deployed patch for [[phab:T256171|T256171]]
* 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|667d88054097b195208818aee15bb1eb58955437}}: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
* 21:56 sbassett: Deployed patch for [[phab:T255918|T255918]]
* 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
* 20:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315 [[phab:T256679|T256679]]', diff saved to https://phabricator.wikimedia.org/P11699 and previous config saved to /var/cache/conftool/dbconfig/20200629-200002-marostegui.json
* 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8e719d54baa4c26aaa090e02503b0d9473301cce}}: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
* 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 [[phab:T256679|T256679]]', diff saved to https://phabricator.wikimedia.org/P11698 and previous config saved to /var/cache/conftool/dbconfig/20200629-194327-marostegui.json
* 21:43 Amir1: deleting auto-review logs from test2wiki ([[phab:T285608|T285608]])
* 18:55 shdubsh: test mtail rc35+wmf2 on cp5001 - [[phab:T255776|T255776]]
* 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 18:15 Urbanecm: Morning B&C done
* 21:29 cstone: civicrm revision changed from {{Gerrit|789c92d13b}} to {{Gerrit|e07c2be1a7}}
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c86fcd4}}: Add HTTP proxy to MediaModeration ([[phab:T247943|T247943]]) (duration: 00m 58s)
* 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T284931|T284931]] [[phab:T284459|T284459]] [[phab:T284394|T284394]])
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|aeb7b52}}: Setup rollbacker and mover on lijwiki ([[phab:T256109|T256109]]) (duration: 02m 05s)
* 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
* 17:30 sukhe: LDAP - added datn to groups wmde, nda - [[phab:T254442|T254442]]
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
* 15:43 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
* 15:43 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
* 15:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
* 15:37 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
* 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11696 and previous config saved to /var/cache/conftool/dbconfig/20200629-153140-marostegui.json
* 17:57 thcipriani: restart ci jenkins following upgrade
* 15:20 gehel: repool wdqs1004 - catched up on lag
* 17:54 thcipriani: restart releases-jenkins following upgrade
* 14:50 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy (duration: 00m 06s)
* 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci [[phab:T285532|T285532]]
* 14:50 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Redeploy to fix transient error in gom wiktionary deploy
* 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per [[:phab:T285866]]' # [[phab:T285866|T285866]]
* 14:48 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 13m 40s)
* 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
* 14:34 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
* 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
* 14:33 hnowlan@deploy1001: Finished deploy [restbase/deploy@900bcf6]: Enable gom wiktionary (duration: 17m 49s)
* 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 20s)
* 14:28 ema: A:cp rolling purged upgrade to 0.16 [[phab:T256479|T256479]]
* 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 16s)
* 14:22 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:608309{{!}}Add "E" as an alias of EntitySchema namespace on wikidata (T245529)]] (duration: 00m 57s)
* 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 14:20 ema: upload purged 0.16 to apt.wm.org [[phab:T256479|T256479]]
* 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource ([[phab:T284389|T284389]])
* 14:16 hnowlan@deploy1001: Started deploy [restbase/deploy@900bcf6]: Enable gom wiktionary
* 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 17s)
* 14:14 hnowlan@deploy1001: Finished deploy [restbase/deploy@ce5177e]: Enable gom wiktionary (duration: 20m 44s)
* 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 14s)
* 14:02 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Fix 'closed-labs' reading as 'closed' for static config (duration: 00m 56s)
* 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource ([[phab:T284389|T284389]]) (duration: 01m 13s)
* 13:54 jforrester@deploy1001: Synchronized dblists/: Drop nonbetafeatures dblist, unused (duration: 00m 57s)
* 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 13:54 hnowlan@deploy1001: Started deploy [restbase/deploy@ce5177e]: Enable gom wiktionary
* 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 16s)
* 13:50 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Drop 'nonbetafeatures' dblist from production reads (duration: 00m 56s)
* 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 13s)
* 13:49 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Switch uses from nonbetafeatures to lockeddown (duration: 00m 57s)
* 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 15s)
* 13:47 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: Add 'lockeddown' dblist to production reads (duration: 00m 57s)
* 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki ([[phab:T284885|T284885]])
* 13:43 jforrester@deploy1001: Synchronized dblists/lockeddown.dblist: Add lockddown dblist (unused as yet) (duration: 00m 59s)
* 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 13:35 vgutierrez: depool cp3053 due to nvme hardware issues
* 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 13:02 XioNoX: test pfw3-codfw uplinks failover
* 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki ([[phab:T284885|T284885]]) (duration: 01m 14s)
* 13:00 elukey: move archiva.wikimedia.org to archiva1002 (new buster vm); create archiva-old.wikimedia.org to archiva1001
* 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 12s)
* 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P11693 and previous config saved to /var/cache/conftool/dbconfig/20200629-125824-marostegui.json
* 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 14s)
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1085', diff saved to https://phabricator.wikimedia.org/P11692 and previous config saved to /var/cache/conftool/dbconfig/20200629-125630-marostegui.json
* 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 12:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki ([[phab:T284450|T284450]])
* 12:32 jayme: deleted all tags for docker-registry.wikimedia.org/envoy-tls-local-proxy from docker registry - [[phab:T253396|T253396]]
* 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # [[phab:T284450|T284450]]
* 12:20 marostegui: Stop MySQL on db2096 (codfw x1 master) for reimage [[phab:T254871|T254871]]
* 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 12:03 cdanis: re-pool eqiad [[phab:T256512|T256512]]
* 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 13s)
* 11:59 cdanis: deployed {{Gerrit|I132075ee}} on cr1-eqiad [[phab:T256512|T256512]]
* 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki ([[phab:T284450|T284450]]) (duration: 01m 16s)
* 11:58 cdanis: deployed {{Gerrit|I132075ee}} on cr2-eqiad [[phab:T256512|T256512]]
* 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
* 11:58 cdanis: deployed {{Gerrit|I132075ee}} on cr2-eqiad
* 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
* 11:41 cdanis: depool eqiad  [[phab:T256512|T256512]]
* 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 11:15 awight: EU BACON cooked
* 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 11:08 marostegui: Deploy schema change on db1095:3312 (lag will show up)
* 13:26 moritzm: installing fluidsynth security updates on stretch
* 10:41 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:608284{{!}} Bumping portals to master (608284)]] (duration: 00m 57s)
* 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
* 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:608284{{!}} Bumping portals to master (608284)]] (duration: 00m 58s)
* 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
* 10:29 gehel: restart blazegraph on wdqs1004 + depool to catchup on lag
* 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
* 09:59 ema: cp2040: upgrade purged to 0.16 [[phab:T256479|T256479]]
* 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
* 09:59 jbond42: switch idp to memcached
* 13:04 mutante: switching docker-registry to nginx light variant [[phab:T164456|T164456]]
* 09:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
* 09:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
* 09:45 marostegui: Deploy schema change on dbstore1004:3312
* 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
* 09:11 jbond42: dploying shellcheck CI https://gerrit.wikimedia.org/r/c/operations/puppet/+/602693
* 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
* 08:59 marostegui: Compress InnoDB on db1089 (this will cause lag and will take a few days) - [[phab:T254462|T254462]]
* 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
* 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089 for InnoDB compression [[phab:T254462|T254462]]', diff saved to https://phabricator.wikimedia.org/P11690 and previous config saved to /var/cache/conftool/dbconfig/20200629-085854-marostegui.json
* 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Fully pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11688 and previous config saved to /var/cache/conftool/dbconfig/20200629-084827-marostegui.json
* 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
* 08:40 ema: cp2034: restart purged [[phab:T256444|T256444]]
* 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
* 08:36 ema: cp4025: restart purged [[phab:T256444|T256444]]
* 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
* 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11687 and previous config saved to /var/cache/conftool/dbconfig/20200629-083631-marostegui.json
* 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
* 08:33 ema: cp1087, cp2033, cp2037, cp2039: repool after spending (way) more than 24h depooled [[phab:T256444|T256444]]
* 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11686 and previous config saved to /var/cache/conftool/dbconfig/20200629-082635-marostegui.json
* 12:17 kart_: Updated cxserver to 2021-06-30-112813-production ([[phab:T284900|T284900]], [[phab:T284885|T284885]])
* 08:24 marostegui: Deploy schema change on s2 codfw (lag will show up) [[phab:T253276|T253276]]
* 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
* 08:04 XioNoX: add term selected-paths to policy BGP_IXP_in on all routers
* 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:03 godog: prometheus eqiad -- lvextend --resizefs --size +200G vg-ssd/prometheus-ops
* 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly pool db1135 into s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11685 and previous config saved to /var/cache/conftool/dbconfig/20200629-080253-marostegui.json
* 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1135 (depooled) to s1 [[phab:T253217|T253217]]', diff saved to https://phabricator.wikimedia.org/P11684 and previous config saved to /var/cache/conftool/dbconfig/20200629-074611-marostegui.json
* 11:46 Lucas_WMDE: EU backport+config window done
* 07:16 XioNoX: push new pfw firewall rules - [[phab:T256170|T256170]]
* 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
* 07:13 marostegui: Deploy schema change on db1085 with replication to labs [[phab:T253276|T253276]]
* 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:701505{{!}}Remove $wmgWikibaseClientRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 01m 16s)
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P11683 and previous config saved to /var/cache/conftool/dbconfig/20200629-071236-marostegui.json
* 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
* 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1080 from MW', diff saved to https://phabricator.wikimedia.org/P11682 and previous config saved to /var/cache/conftool/dbconfig/20200629-065335-marostegui.json
* 11:11 moritzm: installing libgcrypt security updates on buster
* 06:50 elukey: execute gnt-instance remove an-launcher1001.eqiad.wmnet on ganeti1011 - [[phab:T256363|T256363]]
* 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
* 06:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:701504{{!}}Stop setting Wikibase client repoConceptBaseUri (T257260)]] (duration: 01m 24s)
* 06:46 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 10:44 moritzm: installing gnutls security updates on buster
* 06:45 marostegui: Deploy MCR schema change  on db1090:3312
* 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
* 06:35 elukey: force puppet run on ores* to overcome celery OOMs on some nodes
* 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - [[phab:T162123|T162123]]
* 04:57 marostegui: Stop MySQL on db1080 to clone db1135 [[phab:T253217|T253217]]
* 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
* 04:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
* 04:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:31 godog: remove sdf1 from thanos-be1003 in swift - [[phab:T285835|T285835]]
* 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
* 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet