You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(cdanis: T259621 ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ homer 'cr*' commit 'drain cr2-esams transport link')
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(300 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-08-25 ==
== 2021-08-03 ==
* 00:56 cdanis: [[phab:T259621|T259621]] ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•˜πŸΊ homer 'cr*' commit 'drain cr2-esams transport link'
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:36 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request chassis routing-engine master switch
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:30 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request vmhost reboot re0
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 00:24 cdanis: [[phab:T259621|T259621]] cdanis@re1.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re0
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:18 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request chassis routing-engine master switch
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:14 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request vmhost reboot re1
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 00:08 cdanis: [[phab:T259621|T259621]] cdanis@re0.cr3-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-17.3R3-S8.1.tgz re1
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.jsonΒ  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.jsonΒ  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-masterΒ  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)


== 2020-08-24 ==
== 2021-08-02 ==
* 23:46 cdanis: depool esams [[phab:T259621|T259621]]
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:16 Urbanecm: Evening B&C window done
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:06 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|778f710bbbdb24730f7ce4c75d5ff1ca7a5ce3b3}}: Alternate configuration mechanism for Parsoid ([[phab:T241961|T241961]]) (duration: 00m 58s)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:13 rzl@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:10 rzl@cumin1001: START - Cookbook sre.hosts.decommission
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:29 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Deployed additional mitigations for [[phab:T257687|T257687]] (duration: 00m 58s)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:29 rzl: re-enabled puppet on 'R:File = /etc/nutcracker/nutcracker.yml' [[phab:T261154|T261154]]
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:25 rzl: disabling puppet on 'R:File = /etc/nutcracker/nutcracker.yml' to swap mc2028 out for mc2037 [[phab:T261154|T261154]]
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:10 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: cirrus: Increase weight of grants and research namespaces in metawiki search (duration: 00m 58s)
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 15:20 jynus: shutdown backup2001 [[phab:T260764|T260764]]
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 15:13 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 15:08 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 21:31 tzatziki: removing 1 file for legal compliance
* 15:04 vgutierrez: rolling restart of ats-tls to disable ECDHE-RSA-AES128-SHA - [[phab:T258405|T258405]]
* 21:16 tzatziki: removing 7 files for legal compliance
* 14:58 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:55 rzl: switchover test complete, puppet re-enabled on cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:54 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:53 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 19:00 urbanecm: Morning B&C window completed
* 14:52 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 14:52 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 14:48 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:48 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:47 godog: powercycle ganeti5002 -- host down and nothing in console
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:43 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2020-08-24 14:43:35.570234
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 14:43 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 14:43 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 14:42 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:42 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 14:42 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:41 rzl@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2020-08-24 14:41:55.754938
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 14:41 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 14:41 dcausse: creating cirrus indices for lldwiki
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:39 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 14:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 14:28 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 14:28 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:24 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:24 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:24 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 14:22 moritzm: installing libexif security updates on stretch
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:18 rzl: disabling puppet on cumin1001 and starting a test of the DC switchover automation, expect some SAL noise but no production impact
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:08 duesen: Deployed patch for [[phab:T260485|T260485]]
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 13:59 marostegui: Stop mysql on db1117:3325 to clone db1128 - [[phab:T260324|T260324]]
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for MCR change', diff saved to https://phabricator.wikimedia.org/P12327 and previous config saved to /var/cache/conftool/dbconfig/20200824-135538-marostegui.json
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3318 after MCR change', diff saved to https://phabricator.wikimedia.org/P12326 and previous config saved to /var/cache/conftool/dbconfig/20200824-133032-marostegui.json
* 12:20 mutante: gerrit servers: disabling puppet
* 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 13:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12325 and previous config saved to /var/cache/conftool/dbconfig/20200824-131305-marostegui.json
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 13:05 moritzm: installing imagemagick security updates on stretch
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12323 and previous config saved to /var/cache/conftool/dbconfig/20200824-130024-marostegui.json
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P12322 and previous config saved to /var/cache/conftool/dbconfig/20200824-125131-marostegui.json
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for MCR change', diff saved to https://phabricator.wikimedia.org/P12321 and previous config saved to /var/cache/conftool/dbconfig/20200824-122848-marostegui.json
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 after MCR change', diff saved to https://phabricator.wikimedia.org/P12320 and previous config saved to /var/cache/conftool/dbconfig/20200824-122752-marostegui.json
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12319 and previous config saved to /var/cache/conftool/dbconfig/20200824-122050-marostegui.json
* 11:27 hashar: restarting Jenkins on contint2001
* 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12318 and previous config saved to /var/cache/conftool/dbconfig/20200824-121200-marostegui.json
* 11:27 hashar: restarting Jenkins on contint1001
* 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P12317 and previous config saved to /var/cache/conftool/dbconfig/20200824-120310-marostegui.json
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:01 Urbanecm: EU B&C window completed
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:01 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|8c380d65d760591099c296ae522b2e63953413aa}}: Enable tewiki as import source for tewikibooks ([[phab:T260107|T260107]]) (duration: 00m 57s)
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 11:58 XioNoX: test advertise CF tunnel endpoint on cr1-eqiad - [[phab:T259036|T259036]]
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:57 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5a6d025b04eb20787e8abbbdd56a3abb3818b82f}}: Add retrobibliothek.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T261012|T261012]]) (duration: 00m 56s)
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e1ae39afbb4d6f33e74782580db7dfee06d0097d}}: Enable mapframe at trwiki ([[phab:T260594|T260594]]) (duration: 00m 58s)
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/WikimediaEvents/modules/ext.wikimediaEvents/searchSatisfaction.js: {{Gerrit|1066ecbe2836e69211c905f597ad6b62241528c0}}: Enable MediaSearch A/B test ([[phab:T254388|T254388]]) (duration: 00m 56s)
* 11:13 urbanecm: EU B&C window completed
* 11:42 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/ContentTranslation/modules/publish/ext.cx.wikibase.link.js: {{Gerrit|74a87184408937bcdb4a27f1f563bbbdff45cf97}}: Publish: Fix broken wikidata linking ([[phab:T249458|T249458]]) (duration: 00m 58s)
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 11:39 Urbanecm: Purge 13 URLs with purgeList.php, see P12316 for list of them ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]])
* 11:08 moritzm: installing openjdk-11 security updates
* 11:34 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 11:32 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 11:32 arturo: add liblept5 1.76.0-1~bpo9+1 (and leptonica-progs) to stretch-wikimedia/component/tesseract-410-bpo ([[phab:T247422|T247422]])
* 07:24 moritzm: installing libsndfile security updates on buster
* 11:30 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|fe0449d244ee876e4fb64da630f0994ab114f248}}: {{Gerrit|74220d0943e6b32cce3c93dd5b9f8bbc63fa5d73}}: {{Gerrit|7db8a19c512cea84f3000463e9dfb6617857c9a6}}: Update Chinese wordmarks and taglines, update zhwikisource project logo ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]]) (duration: 00m 59s)
* 07:12 moritzm: installing aspell security updates
* 11:29 urbanecm@deploy1001: Synchronized static/images/: {{Gerrit|fe0449d244ee876e4fb64da630f0994ab114f248}}: {{Gerrit|74220d0943e6b32cce3c93dd5b9f8bbc63fa5d73}}: {{Gerrit|7db8a19c512cea84f3000463e9dfb6617857c9a6}}: Update Chinese wordmarks and taglines, update zhwikisource project logo ([[phab:T260908|T260908]]; [[phab:T258552|T258552]]; [[phab:T261076|T261076]]; [[phab:T261110|T261110]]) (duration: 00m 58s)
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:21 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:46 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 00m 58s)
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 10:45 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:622116{{!}} Bumping portals to master (T128546)]] (duration: 01m 00s)
* 10:43 moritzm: installing ruby2.3 security updates
* 10:12 moritzm: installing firejail security updates on mw canaries
* 09:58 oblivian@cumin1001: conftool action : set/weight=1; selector: dc=codfw,cluster=appserver,service=canary
* 09:46 XioNoX: add PNI to CF on cr1-eqiad with import/export NONE - [[phab:T259036|T259036]]
* 09:18 moritzm: restarting mw canaries to pick up libx11 update
* 09:13 moritzm: installing libx11 security updates on stretch
* 09:10 vgutierrez: repool cp5002
* 09:08 _joe_: restarting php-fpm on mw1344 (stuck in SIGILL for new children)
* 09:00 vgutierrez: restart ats-tls on cp5002
* 08:54 moritzm: installing net-snmp security updates on buster
* 08:52 ema: depool cp5002 due to icinga errors
* 08:24 moritzm: installing json-c security updates on buster
* 07:36 XioNoX: push new pfw policies - [[phab:T261007|T261007]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318, db1105:3311 for MCR change', diff saved to https://phabricator.wikimedia.org/P12315 and previous config saved to /var/cache/conftool/dbconfig/20200824-052916-marostegui.json


== 2020-08-23 ==
== 2021-07-31 ==
* 20:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 20:36 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 20:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:15 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 20:02 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime
* 11:23 gehel: repool wdqs1006 - catched up on lag


== 2020-08-22 ==
== 2021-07-30 ==
* 19:33 ryankemper: depooled wdqs1006 (still has 2.5 hours to catch up on)
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 19:31 ryankemper: pooled wdqs1006 now that lag has dissipated
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 07:36 gehel: restart blazegraph on wdqs1006 + depool to catchup on lag
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 05:24 legoktm: legoktm@mwmaint1002:~$ echo "https://releases.wikimedia.org/mediawiki/1.35/" {{!}} mwscript purgeList.php --wiki=aawiki
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== 2020-08-21 ==
== 2021-07-29 ==
* 17:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 17:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 17:00 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 16:58 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15Β  refs [[phab:T281157|T281157]]
* 16:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 16:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 16:17 zpapierski@deploy1001: Finished deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification (duration: 00m 50s)
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 16:16 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: .. redeploy after theory verification
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 16:15 zpapierski@deploy1001: deploy aborted: .. (duration: 00m 01s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 16:15 zpapierski@deploy1001: Started deploy [search/mjolnir/deploy@c80e2e7]: ..
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 13:25 jayme@cumin1001: conftool action : set/pooled=True; selector: dnsdisc=termbox,name=codfw
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 13:25 jayme@cumin1001: conftool action : set/pooled=False; selector: dnsdisc=termbox,name=codfw
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 09:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 09:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 01:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 01:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 01:15 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 01:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2020-08-20 ==
== 2021-07-28 ==
* 22:31 eileen: civicrm revision changed from {{Gerrit|27d5900f7d}} to {{Gerrit|ce28723709}}, config revision is {{Gerrit|706cf3c898}}
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 22:20 eileen: civicrm revision is {{Gerrit|27d5900f7d}}, config revision is {{Gerrit|706cf3c898}}
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 22:20 mutante: permanently shut down tungsten.eqiad.wmnet [[phab:T260395|T260395]] [[phab:T158837|T158837]] [[phab:T180761|T180761]] [[phab:T224549|T224549]]
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 22:18 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16Β  refs [[phab:T281157|T281157]]
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 21:35 ejegg: updated fundraising CiviCRM from {{Gerrit|958a79f660}} to {{Gerrit|27d5900f7d}}
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 20:53 cdanis: repool eqsin
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 20:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 20:36 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 20:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 20:25 cdanis: cdanis@cr2-eqsin> request vmhost reboot
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 20:17 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 20:16 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 20:13 cdanis: cdanis@cr2-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-18.2R3-S5.3.tgz
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 20:11 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 20:02 cdanis: depool eqsin for router upgrade
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 19:57 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 19:37 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 19:34 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 19:34 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 19:24 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 19:17 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 19:17 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 19:08 mutante: restarted apache on cont2001 for integration.wikimedia.org docroot change
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 19:07 mutante: switching document root of integration.wikimedia.org to scap ([[phab:T149924|T149924]])
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 19:02 twentyafterfour: 1.36.0-wmf.5 has no known blockers and logspam is cleaned up, time to roll group2 wikis to wmf.5
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:42 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 18:42 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 18:19 mutante: ores1004 - starting failed celery-ores-worker
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 18:18 mutante: testreduce1001 - rt_client and vd_client now properly stopped by puppet [[phab:T257906|T257906]]
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 17:29 shdubsh: restart elasticsearch on logstash1012 (not 1020) -- high gc runtimes
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:28 shdubsh: restart elasticsearch on logstash1020 -- high gc runtimes
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 17:23 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.prepare-upgrade (exit_code=97)
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 17:23 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 17:22 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 17:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 16:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:29 moritzm: installing python2.7 security updates on stretch
* 16:48 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 13:08 moritzm: installing python3.5 security updates on stretch
* 16:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:45 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 11:27 moritzm: installing nginx security updates on thumbor*
* 16:43 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 16:40 _joe_: restarted apache2 on icinga1001
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 16:13 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 16:11 shdubsh: restart elasticsearch on logstash1011 -- long gc runs
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 16:10 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 16:08 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:02 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 14:55 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:06 oblivian@deploy1001: Finished deploy [ores/deploy@8540eec]: various configuration fixes (duration: 09m 03s)
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:57 oblivian@deploy1001: Started deploy [ores/deploy@8540eec]: various configuration fixes
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:53 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 13:53 oblivian@deploy1001: Finished deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 14m 00s)
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 13:39 oblivian@deploy1001: Started deploy [ores/deploy@e860508]: switch everything to use envoy as a service proxy [[phab:T244843|T244843]]
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:26 oblivian@deploy1001: Finished deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (take 2) (duration: 11m 37s)
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:14 oblivian@deploy1001: Started deploy [ores/deploy@74677b6]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (take 2)
* 08:27 Amir1: running several long-running queries against pc1007
* 13:11 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 11m 19s)
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 gehel: repool wdqs1007 - catched up on lag
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:00 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]]
* 07:53 moritzm: installing aspell security updates on stretch
* 12:51 oblivian@deploy1001: Finished deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]] (duration: 07m 03s)
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 12:44 oblivian@deploy1001: Started deploy [ores/deploy@a208a0e]: switch testwiki to use envoy as a service proxy [[phab:T244843|T244843]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 11:49 Lucas_WMDE: EU backport window done
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 11:44 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/AbuseFilter/includes/AbuseFilterHooks.php: {{Gerrit|d762e7b5526d91fe21e5980bc5e9f3be06a2f85c}}: Use $user param when filtering edits ([[phab:T258717|T258717]]) (duration: 01m 05s)
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 11:41 eileen: civicrm revision changed from {{Gerrit|6c9441a18e}} to {{Gerrit|958a79f660}}, config revision is {{Gerrit|706cf3c898}}
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 11:38 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/AbuseFilter/includes/AbuseFilterHooks.php: {{Gerrit|00da39b6913ac2eab600bbb61258472b60d2cbcb}}: Use $user param when filtering edits ([[phab:T258717|T258717]]) (duration: 01m 05s)
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/Wikibase/client/data-bridge/dist/: Backport: [[gerrit:621488{{!}}Don't try to load source maps in production (T260852)]] (duration: 01m 07s)
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 11:07 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix testwikidata depicts id & CirrusSearchUserTesting config (duration: 01m 06s)
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 11:07 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript emptyUserGroup.php --wiki=trwiki editor # [[phab:T260899|T260899]]
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 10:58 XioNoX: re-pool codfw - [[phab:T259621|T259621]]
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php
* 10:53 XioNoX: un-drain cr1-codfw - [[phab:T259621|T259621]]
* 10:45 XioNoX: cr1-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 10:26 hashar: Restarted zuul-merger instances on contint1001 and contint2001
* 10:24 hashar@deploy1001: Finished deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events (duration: 00m 24s)
* 10:24 hashar@deploy1001: Started deploy [zuul/deploy@8a05b4d]: Support Gerrit replication events
* 10:21 XioNoX: cr1-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 10:12 XioNoX: reboot cr1-codfw:re1 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:57 XioNoX: bump cr1-codfw OSPF metrics - [[phab:T259621|T259621]]
* 09:51 XioNoX: enable transit/peering and re-set normal OSPF values on cr2-codfw - [[phab:T259621|T259621]]
* 09:41 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:36 eileen: civicrm revision changed from {{Gerrit|cf9fadbeed}} to {{Gerrit|6c9441a18e}}, config revision is {{Gerrit|706cf3c898}}
* 09:33 XioNoX: reboot cr2-codfw:re0 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:18 XioNoX: cr2-codfw> request chassis routing-engine master switch - [[phab:T259621|T259621]]
* 09:18 kormat: stress-testing db2125 [[phab:T260670|T260670]]
* 09:08 XioNoX: reboot cr2-codfw:re1 (backup) for upgrade - [[phab:T259621|T259621]]
* 09:03 kormat@cumin1001: dbctl commit (dc=all): 'Repool db2125 after host failure [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12303 and previous config saved to /var/cache/conftool/dbconfig/20200820-090313-kormat.json
* 08:52 kormat: removing /usr/bin/check_mariadb.py from all db hosts [[phab:T259516|T259516]]
* 08:52 XioNoX: disable transit/peering on cr2-codfw - [[phab:T259621|T259621]]
* 08:48 XioNoX: bump cr2-codfw OSPF metrics - [[phab:T259621|T259621]]
* 08:44 jynus: running analyze table on db1115's tendril.global_status_log, may case some stalls on tendril/dbtree [[phab:T260876|T260876]]
* 08:41 XioNoX: depool codfw for routers upgrade - [[phab:T259621|T259621]]
* 08:31 XioNoX: enable transit/peering on cr3-knams - [[phab:T259621|T259621]]
* 08:21 XioNoX: reboot cr3-knams for upgrade - [[phab:T259621|T259621]]
* 08:07 XioNoX: disable transit/peering on cr3-knams - [[phab:T259621|T259621]]
* 07:39 hashar: contint2001: restarted zuul
* 07:29 hashar: contint1001: restarted zuul-merger
* 07:29 hashar@deploy1001: Finished deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - [[phab:T258630|T258630]] (duration: 00m 13s)
* 07:28 hashar@deploy1001: Started deploy [zuul/deploy@5989ed0]: Upgrade gear from 0.7.0 to 1.15.1+wmf1 - [[phab:T258630|T258630]]
* 01:54 ejegg: re-enabled fundraising scheduled jobs
* 00:51 mutante: ms-be1039 - started failed ferm service
* 00:35 ejegg: stopped fundraising scheduled jobs
* 00:27 eileen: civicrm revision changed from {{Gerrit|c442a09153}} to {{Gerrit|cf9fadbeed}}, config revision is {{Gerrit|3cdffd4fc2}}


== 2020-08-19 ==
== 2021-07-27 ==
* 23:20 Urbanecm: Evening B&C window closed
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a80899948c26ca36b970b80fbad07600fe4ce92c}}: Enable VisualEditor in namespaces Draft and Wikiproject on hywiki ([[phab:T260825|T260825]]) (duration: 01m 05s)
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 22:41 eileen: civicrm revision changed from {{Gerrit|34f95a3311}} to {{Gerrit|c442a09153}}, config revision is {{Gerrit|3cdffd4fc2}}
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 21:27 eileen: civicrm revision changed from {{Gerrit|154519cc1f}} to {{Gerrit|34f95a3311}}, config revision is {{Gerrit|3cdffd4fc2}}
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 21:17 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 21:17 cdanis@cumin1001: START - Cookbook sre.network.cf
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 20:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]]) (duration: 00m 06s)
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 20:39 dpifke@deploy1001: Started deploy [performance/arc-lamp@2ef1af7]: Deploy fixes for notifications and OOM prevention ([[phab:T259167|T259167]])
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 19:43 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader2001 with debug logging
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 19:20 mutante: testreduce1001 - re-enabled puppet, confirmed parsoid-rt service was now stopped properly by puppet while it runs as before on scandium, the previous parsoid-testing host. switching it over is now a Hiera one-liner. ([[phab:T257906|T257906]])
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 19:15 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]] (duration: 01m 04s)
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 19:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|60af096b80a8ef7bc94ec40ce203fd27b0c97f26}}: Add autopatrolled group at arzwiki ([[phab:T260761|T260761]]) (duration: 01m 04s)
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 mutante: testreduce1001 - disable puppet; stop parsoid-rt service
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 18:47 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|924a03bd624d6750a7e776e09713056cc45e5cc5}}: Add clinton.presidentiallibraries.us to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T259927|T259927]]) (duration: 01m 04s)
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 18:45 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|83b34e1bd1ed804a70f67e089580e082f89e2a0f}}: ClosedWikiProvider: Use testUserForCreation rather than testForAuthentication ([[phab:T258695|T258695]]) (duration: 01m 04s)
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 18:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|95d45f6e002df78d4860a711042d77a6b0bdecb9}}: Dont index Draft (118) and Draft talk (119) on hywiki ([[phab:T260804|T260804]]) (duration: 01m 04s)
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 18:32 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 04s)
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 18:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|803cb1a0d2c8cc6df8e4e88ab3c4d27eb71d01b3}}: Update taglines for various projects ([[phab:T258552|T258552]]) (duration: 01m 06s)
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 18:25 mutante: rebooting webperf1002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 18:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 18:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bb4aa44b0bd5b2b33d190d3af81e038e5fc55e3f}}: Configure namespaces on commons to include categories ([[phab:T198716|T198716]]) (duration: 01m 04s)
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 18:21 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 2/2) (duration: 01m 04s)
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 18:19 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: {{Gerrit|b9043331c1c1b352256cffd471b9ff128806607c}}: Update project wordmarks ([[phab:T254788|T254788]]; sync 1/2) (duration: 01m 06s)
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 18:15 mutante: rebooting webperf2002 VM on ganeti level (outside OS) to upgrade rom 8 to 16GB RAM ([[phab:T260192|T260192]])
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 18:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6f8354e7599a5e92bea060807065f5b42c540e5}}: Enable $wgMFNoindexPages for all wikis ([[phab:T255458|T255458]]) (duration: 01m 07s)
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 18:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:13 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 18:13 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 17:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 17:38 mutante: decom'ing releases2001.codfw.wmnet (
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 17:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 16:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 16:37 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 16:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 16:30 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 15:41 rzl: finished exercising the switchdc cookbooks with --live-test for now, all changes reverted including re-enabling puppet on cumin1001
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 15:38 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 15:37 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 15:34 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 15:34 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 15:33 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 15:33 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 15:31 jbond42: update java.security https://gerrit.wikimedia.org/r/c/operations/puppet/+/593467
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 15:30 oblivian@cumin1001: conftool action : set/ttl=300; selector: dnsdisc=api-rw
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 15:26 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 15:26 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 15:22 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 15:22 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 15:18 godog: prometheus codfw lvextend --resizefs --size +80G /dev/mapper/vg--ssd-prometheus--ops
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 15:17 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 15:17 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 15:16 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 15:16 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 15:14 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 15:14 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 15:08 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=99)
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 15:08 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 15:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 15:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 14:50 rzl@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=99)
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 14:50 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 14:50 rzl: running the switchdc cookbooks with --live-test, simulating a switch to eqiad where we're already running, no production impact is expected
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 14:47 rzl@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 14:47 rzl@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 14:41 rzl: disable puppet on cumin1001 for switchdc testing
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 14:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 14:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 14:27 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:38 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 13:34 gehel: depooling wdqs1007 and restarting blazegraph
* 14:11 moritzm: installing aspell security updates
* 13:29 _joe_: depooling and disabling puppet on restbase1024 for further investigation
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 13:27 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:25 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:10 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:03 _joe_: building and uploading fluent-bit, ratelimit images
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:01 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 12:57 _joe_: building a new version of the base docker images
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:29 awight: EU bacon finished
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 11:28 effie: restart mwdebug* servers
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 11:08 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:621227{{!}}Fix typos in flaggedrevs comments ()]] (duration: 01m 19s)
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:43 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:36 XioNoX: update firewall policies on pfw - [[phab:T260585|T260585]]
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 08:35 jayme: running puppet on A:all-mw-eqiad
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 08:20 godog: switch grafana.w.o to grafana 7 in codfw - [[phab:T259143|T259143]]
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 08:19 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:23 Lucas_WMDE: EU backport+config window done
* 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 08:14 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 08:06 jayme: running puppet on A:all-mw-eqiad
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 07:46 godog: upgrade to grafana 7 on cloudmetrics hosts - [[phab:T259143|T259143]]
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 07:15 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 07:10 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 06:39 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 06:13 eileen: tools revision changed from {{Gerrit|b4ebd1e564}} to {{Gerrit|0b9d971bc4}}
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 06:07 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 06:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 06:03 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 06:00 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 05:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 05:53 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 05:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 05:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 05:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 03:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 03:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 08:57 _joe_: repooling mw225[12] for apis
* 02:53 cstone: civicrm revision changed from {{Gerrit|f5469d0a4c}} to {{Gerrit|154519cc1f}}
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 02:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 01:05 dpifke@deploy1001: Synchronized wmf-config/profiler.php: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620139 (duration: 01m 18s)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 00:49 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Disabling old XHGui backend ([[phab:T180761|T180761]]) (duration: 05m 13s)
* 08:36 jynus: reenabled puppet on mwmaint1002
* 00:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-08-18 ==
== 2021-07-26 ==
* 23:45 catrope@deploy1001: Synchronized php-1.36.0-wmf.5/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D ([[phab:T258021|T258021]]) (duration: 01m 05s)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 23:44 catrope@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/GrowthExperiments: Only fetch task card data for users in variant C/D ([[phab:T258021|T258021]]) (duration: 01m 06s)
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 23:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1301.eqiad.wmnet
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 23:34 Urbanecm: Run scap pull at mw1301
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 23:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable static maps on testwiki, disable them on test2wiki (duration: 03m 22s)
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:32 mutante: rebooting mw1301 via mgmt
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 23:22 mutante: killed reboot-cluster on cumin1001
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 23:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ac34f7274823e40d0c79752eb5ffe74c76856d04}}: Enable subpages in NS:0 in techconductwiki ([[phab:T260350|T260350]]) (duration: 05m 14s)
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 23:04 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1300.eqiad.wmnet
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 22:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 22:09 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 22:07 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 22:06 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 21:37 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:34 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 21:24 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 21:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 21:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 20:27 hashar: https://releases-jenkins.wikimedia.org/ changed agent from releases1001 to releases1002
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 20:14 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 20:11 mutante: running puppet on cp-ats-ulsfo and switching releases-jenkins backend
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:07 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]] (duration: 53m 12s)
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 20:00 mutante: releases1001 rm /etc/rsync.d/frag* & run puppet
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 19:54 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002/2002 with --deleteΒ  [[phab:T256164|T256164]]
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 19:47 ejegg: updated payments-wiki from {{Gerrit|a7ee1790e0}} to {{Gerrit|ef7ebd08cb}}
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 19:44 hashar: Deleting old jobs from https://releases-jenkins.wikimedia.org/ # [[phab:T256164|T256164]]
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 19:41 hashar: releases1001: deleting old legacy mediawiki snapshots under /var/lib/jenkins/<nowiki>{</nowiki>REL1_27,REL1_29,REL1_30<nowiki>}</nowiki>Β  # [[phab:T256164|T256164]]
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 19:14 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.5Β  refs [[phab:T257973|T257973]]
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 19:13 twentyafterfour: Promote testwikis from 1.36.0-wmf.4 to 1.36.0-wmf.5 refs [[phab:T257973|T257973]]
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 17:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 16:12 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw14(09{{!}}11{{!}}13).*
* 06:39 moritzm: installing krb5 security updates
* 16:03 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 15:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 15:30 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:02 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 14:56 papaul: replacing msw-c1,c2 and c4
* 14:55 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P12293 and previous config saved to /var/cache/conftool/dbconfig/20200818-145337-marostegui.json
* 14:48 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw13(55{{!}}64{{!}}65).*
* 14:46 XioNoX: move v4 HE on cr3-ulsfo from peering to transit bgp group
* 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12292 and previous config saved to /var/cache/conftool/dbconfig/20200818-144415-marostegui.json
* 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12291 and previous config saved to /var/cache/conftool/dbconfig/20200818-143758-marostegui.json
* 14:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P12290 and previous config saved to /var/cache/conftool/dbconfig/20200818-142937-marostegui.json
* 14:28 marostegui: Stop MYSQL on db2125 for on-site maintenance - [[phab:T260670|T260670]]
* 13:54 marostegui: Revoke DELETE and CREATE from xhgui user on m2 [[phab:T260640|T260640]]
* 13:53 XioNoX: bump Zayo v4 BGP session in eqiad
* 13:49 XioNoX: move v4 HE on cr2-eqord from peering to transit bgp group
* 13:37 XioNoX: move v4 cr1-eqiad from peering to transit bgp group
* 13:04 kormat: disabling puppet on all db machines [[phab:T259516|T259516]]
* 12:57 _joe_: rebooting appservers in eqiad, 3 at a time
* 12:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 12:37 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 12:34 kormat: deploying wmfmariadbpy 0.4
* 12:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 11:53 XioNoX: add new icinga hosts to mr policies - [[phab:T260533|T260533]]
* 11:40 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 11:36 Lucas_WMDE: EU backport&config done
* 11:33 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620888{{!}}Add Wikisource wordmark for trwikisource (T260658)]], part 2 (duration: 00m 55s)
* 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/mobile/copyright/wikisource-wordmark-tr.svg' {{!}} mwscript purgeList.php # [[phab:T260658|T260658]]
* 11:32 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wikisource-wordmark-tr.svg: Config: [[gerrit:620888{{!}}Add Wikisource wordmark for trwikisource (T260658)]], part 1 (duration: 00m 55s)
* 11:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:595543{{!}}Enable Data Bridge on Catalan Wikipedia (T232584)]] (duration: 01m 01s)
* 11:06 jbond42: deploy net-snmp update to buster
* 10:56 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw229.*
* 10:55 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
* 10:54 marostegui: Reboot db2125 after running a full upgrade - [[phab:T260670|T260670]]
* 10:46 marostegui: Powercycle db2125 from the idrac [[phab:T260670|T260670]]
* 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2125 - host down [[phab:T260670|T260670]]', diff saved to https://phabricator.wikimedia.org/P12288 and previous config saved to /var/cache/conftool/dbconfig/20200818-100718-marostegui.json
* 09:45 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 09:43 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=mw2250.codfw.wmnet
* 09:40 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[234].*
* 09:40 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
* 09:35 kart_: Update cxserver to 2020-08-17-090424-production ([[phab:T259980|T259980]])
* 09:32 kartik@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:29 kartik@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:28 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 09:28 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=api_appserver,dc=codfw,name=mw214[02].*
* 09:26 volans: upgraded spicerack to v0.0.39 on cumin hosts
* 09:25 kartik@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:21 volans: uploaded spicerack_0.0.39-1+deb10u1 to apt.wikimedia.org buster-wikimedia
* 09:05 hashar: Restarting CI Jenkins
* 08:44 vgutierrez: restart ats-tls on cp5006
* 08:24 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
* 08:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:16 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 08:10 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1089', diff saved to https://phabricator.wikimedia.org/P12284 and previous config saved to /var/cache/conftool/dbconfig/20200818-080256-marostegui.json
* 07:58 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 07:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 07:45 godog: VictorOps ack'd incidents will re-trigger after 24h if not resolved - [[phab:T259465|T259465]]
* 07:44 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12283 and previous config saved to /var/cache/conftool/dbconfig/20200818-074325-marostegui.json
* 07:42 _joe_: performing rolling reboot of all codfw api servers
* 07:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12282 and previous config saved to /var/cache/conftool/dbconfig/20200818-072349-marostegui.json
* 07:19 oblivian@cumin1001: conftool action : set/pooled=yes; selector: name=mw213[5-9].codfw.wmnet
* 07:16 jynus: update rest of phabricator passwords [[phab:T250361|T250361]]
* 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1089', diff saved to https://phabricator.wikimedia.org/P12281 and previous config saved to /var/cache/conftool/dbconfig/20200818-071121-marostegui.json
* 07:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 07:07 godog: prometheus eqiad: add 100G to prometheus/global
* 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 07:01 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 07:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 06:53 twentyafterfour: phabricator maintenance successful
* 06:48 jynus: deploy another password change to phabricator service (potentially disruptive) [[phab:T250361|T250361]]
* 06:41 XioNoX: add cloudflare PNI IPs in eqiad - [[phab:T259036|T259036]]
* 06:21 jynus: deploy password change to phabricator service [[phab:T146055|T146055]]
* 06:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 06:01 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 05:52 _joe_: running puppet on mc1020 [[phab:T260622|T260622]]
* 05:02 twentyafterfour: phabricator appears to be fully functional
* 05:01 twentyafterfour: phabricator read-only ended
* 05:00 twentyafterfour: phabricator is now read-only
* 05:00 marostegui: Failover m3 (phabricator) database master from db1128 to db1132 - [[phab:T259589|T259589]]
* 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P12279 and previous config saved to /var/cache/conftool/dbconfig/20200818-043241-marostegui.json
* 01:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1376.eqiad.wmnet
* 01:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1343.eqiad.wmnet
* 01:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1344.eqiad.wmnet
* 01:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:48 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1341.eqiad.wmnet
* 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
* 00:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1339.eqiad.wmnet
* 00:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 00:24 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 00:15 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 00:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1315.eqiad.wmnet
* 00:06 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)


== 2020-08-17 ==
== 2021-07-24 ==
* 23:59 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 23:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1313.eqiad.wmnet
* 23:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 23:47 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 23:41 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:40 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1312.eqiad.wmnet
* 23:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 23:30 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:28 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1297.eqiad.wmnet
* 23:26 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 23:25 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:11 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:10 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 23:10 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1288.eqiad.wmnet
* 23:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 22:55 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:43 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1286.eqiad.wmnet
* 22:43 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 22:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:37 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1285.eqiad.wmnet
* 22:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 22:26 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:26 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1284.eqiad.wmnet
* 22:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 22:23 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 22:21 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 22:17 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:09 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1282.eqiad.wmnet
* 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 22:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 22:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1281.eqiad.wmnet
* 22:00 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 21:57 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary ([[phab:T259360|T259360]])
* 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 21:56 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 21:53 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add api-gateway.request stream config [[phab:T259736|T259736]], one host timed out (duration: 00m 55s)
* 21:49 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:48 ppchelko@deploy1001: sync-file aborted: Add api-gateway.request stream config [[phab:T259736|T259736]] (duration: 05m 01s)
* 21:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1278.eqiad.wmnet
* 21:46 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 21:43 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:42 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1279.eqiad.wmnet
* 21:42 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 21:38 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Further mitigations for [[phab:T257687|T257687]] (duration: 00m 57s)
* 21:38 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 21:36 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 21:34 effie: blocking temporarily traffic to mc1020
* 21:23 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:19 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1276.eqiad.wmnet
* 21:12 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2240.codfw.wmnet
* 21:08 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 20:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 20:47 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:38 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 20:20 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:19 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 20:02 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:39 dzahn@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:30 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:28 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 19:22 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 19:01 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]], take 3 (duration: 02m 57s)
* 18:58 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]], take 3
* 18:58 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]], take 2 (duration: 11m 19s)
* 18:46 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]], take 2
* 18:46 ppchelko@deploy1001: Finished deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]] (duration: 131m 17s)
* 18:46 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:43 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 18:39 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 18:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 18:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|808c17d28c5ebf5ed75f70c224d66129eb2edcd8}}: Change logo for lldwiki to match the requested one ([[phab:T259432|T259432]]) (duration: 00m 56s)
* 18:04 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|67e8f886cd1a9cd2b63ed69761bec6c52889a5b6}}: Add logo files for lldwiki ([[phab:T259432|T259432]]) (duration: 00m 56s)
* 17:17 cdanis@cumin1001: conftool action : set/pooled=yes; selector: name=mw1359.*
* 17:06 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 17:04 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw,name=mw2246.codfw.wmnet
* 17:01 oblivian@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
* 16:36 jynus: restart backup2001, backup1001 one after the other
* 16:35 ppchelko@deploy1001: Started deploy [restbase/deploy@7f16bad]: Add thankyouwiki [[phab:T259002|T259002]]
* 16:31 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 16:27 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Update [[phab:T250887|T250887]] mitigations (duration: 00m 56s)
* 16:23 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - remove unneeded override for SearchSatisfaction - [[phab:T259163|T259163]] (duration: 00m 56s)
* 16:22 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:21 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: cluster=jobrunner,dc=codfw,name=mw2250.codfw.wmnet
* 16:20 oblivian@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,dc=codfw
* 16:20 cdanis@cumin1001: START - Cookbook sre.hosts.downtime
* 16:14 cdanis@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1359.*
* 16:12 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
* 16:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 15:44 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]]. take 3. feeds timed out (duration: 01m 31s)
* 15:43 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]]. take 3. feeds timed out
* 15:43 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]]. take 2. feeds timed out (duration: 20m 40s)
* 15:36 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•¦β˜• homer 'cr*'Β  commit 'revert skipping RPKI validation for Jio AS55836 {{Gerrit|I0fd4683}} [[phab:T260452|T260452]]'
* 15:30 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•¦β˜• homer 'cr*-codfw*'Β  commit 'revert skipping RPKI validation for Jio AS55836 {{Gerrit|I0fd4683}} [[phab:T260452|T260452]]'
* 15:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]]. take 2. feeds timed out
* 15:22 ppchelko@deploy1001: Finished deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]] (duration: 02m 30s)
* 15:19 ppchelko@deploy1001: Started deploy [restbase/deploy@ddcecce]: [[phab:T257943|T257943]] [[phab:T260556|T260556]] [[phab:T253478|T253478]] [[phab:T254490|T254490]] [[phab:T259054|T259054]]
* 15:08 ppchelko@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:06 ppchelko@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 15:04 ppchelko@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 14:57 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis (take 2) - [[phab:T254606|T254606]] (duration: 00m 53s)
* 14:57 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
* 14:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-cluster
* 14:44 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - all wikis - [[phab:T254606|T254606]] (duration: 00m 55s)
* 14:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: wgEventLoggingSchemas - schema revision version bump for erroring schemas - group0 - [[phab:T254606|T254606]] (duration: 00m 56s)
* 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12277 and previous config saved to /var/cache/conftool/dbconfig/20200817-141449-marostegui.json
* 14:09 marostegui: Sanitize thankyouwiki on db1124:3315, db2094:3315 - [[phab:T260551|T260551]]
* 14:03 marostegui: Sanitize lldwiki on db1124:3315 and db2094:3315 [[phab:T259436|T259436]]
* 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12276 and previous config saved to /var/cache/conftool/dbconfig/20200817-140229-marostegui.json
* 13:58 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T259432|T259432]])
* 13:54 Urbanecm: Creating thankyouwiki and lldwiki is done
* 13:54 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 52s)
* 13:54 Urbanecm: Create account Pcoombe (WMF) at thankyouwiki, email set to pcoombe@wikimedia.org ([[phab:T259002|T259002]])
* 13:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating thankyouwiki ([[phab:T259002|T259002]]) (duration: 00m 55s)
* 13:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating thankyouwiki ([[phab:T259002|T259002]]) (duration: 00m 55s)
* 13:49 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating thankyouwiki ([[phab:T259002|T259002]])
* 13:48 urbanecm@deploy1001: Synchronized dblists: Creating thankyouwiki ([[phab:T259002|T259002]]) (duration: 00m 55s)
* 13:47 marostegui: Deploy MCR change on db1104
* 13:47 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating thankyouwiki ([[phab:T259002|T259002]]) (duration: 00m 56s)
* 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104 for MCR change', diff saved to https://phabricator.wikimedia.org/P12275 and previous config saved to /var/cache/conftool/dbconfig/20200817-134701-marostegui.json
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12274 and previous config saved to /var/cache/conftool/dbconfig/20200817-134619-marostegui.json
* 13:46 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating thankyouwiki ([[phab:T259002|T259002]]) (duration: 00m 55s)
* 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088 after upgrading its mysql package', diff saved to https://phabricator.wikimedia.org/P12273 and previous config saved to /var/cache/conftool/dbconfig/20200817-134604-marostegui.json
* 13:41 jayme: imported td-agent-bit_1.5.3-0 to buster-wikimedia - [[phab:T260536|T260536]]
* 13:40 jayme: imported !log imported to buster-wikimedia
* 13:39 marostegui: Upgrade db1088 (s6) to a newer mysql version (10.4.14)
* 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for mysql upgrade', diff saved to https://phabricator.wikimedia.org/P12272 and previous config saved to /var/cache/conftool/dbconfig/20200817-133905-marostegui.json
* 13:34 jbond42: deploy json-c security update to buster
* 13:33 marostegui: Restart mysql on db2102 (testing new package)
* 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12271 and previous config saved to /var/cache/conftool/dbconfig/20200817-133043-marostegui.json
* 13:29 urbanecm@deploy1001: Synchronized langlist: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 54s)
* 13:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 55s)
* 13:27 urbanecm@deploy1001: sync-file aborted: Creating lldwiki ([[phab:T259432|T259432]])Β¨ (duration: 00m 00s)
* 13:26 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 53s)
* 13:25 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lldwiki ([[phab:T259432|T259432]])
* 13:23 urbanecm@deploy1001: Synchronized dblists: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 56s)
* 13:22 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 56s)
* 13:20 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: Creating lldwiki ([[phab:T259432|T259432]]) (duration: 00m 55s)
* 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12270 and previous config saved to /var/cache/conftool/dbconfig/20200817-131307-marostegui.json
* 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:09 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P12269 and previous config saved to /var/cache/conftool/dbconfig/20200817-130127-marostegui.json
* 12:58 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:53 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db1089 for MCR change', diff saved to https://phabricator.wikimedia.org/P12268 and previous config saved to /var/cache/conftool/dbconfig/20200817-124458-marostegui.json
* 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12267 and previous config saved to /var/cache/conftool/dbconfig/20200817-124409-marostegui.json
* 12:44 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 12:35 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 12:35 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:27 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12266 and previous config saved to /var/cache/conftool/dbconfig/20200817-122234-marostegui.json
* 12:21 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:20 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:19 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12265 and previous config saved to /var/cache/conftool/dbconfig/20200817-121600-marostegui.json
* 12:05 Lucas_WMDE: EU backport window done
* 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki --fix {{!}} tee [[phab:T259429|T259429]]-fix
* 12:02 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php bjnwiki {{!}} tee [[phab:T259429|T259429]]-dryrun
* 12:01 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620643{{!}}Set Portal and Portal_talk namespaces in bjnwiki as an extra namespace. (T259429)]] (duration: 00m 55s)
* 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P12264 and previous config saved to /var/cache/conftool/dbconfig/20200817-115741-marostegui.json
* 11:54 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620513{{!}}Add Wiktionary wordmark for eswiktionary (T254059)]], part 2 (duration: 00m 57s)
* 11:53 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/mobile/copyright/wiktionary-wordmark-es.svg\n' {{!}} mwscript purgeList.php # [[phab:T254059|T254059]]
* 11:53 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/mobile/copyright/wiktionary-wordmark-es.svg: Config: [[gerrit:620513{{!}}Add Wiktionary wordmark for eswiktionary (T254059)]], part 1 (duration: 00m 56s)
* 11:46 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf 'https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki%s.png\n' '' '-1.5x' '-2x' {{!}} mwscript purgeList.php # [[phab:T259006|T259006]]
* 11:45 lucaswerkmeister-wmde@deploy1001: Synchronized static/images/project-logos/: Config: [[gerrit:620510{{!}}Change the logo of lzh Wikipedia (T259006)]] (duration: 00m 55s)
* 11:40 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620509{{!}}Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks (T260493)]] (duration: 00m 55s)
* 11:35 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:620507{{!}}Add Turkish powered by MW and Wikimedia project icons (T260492)]] (duration: 00m 57s)
* 11:25 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:14 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:10 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:09 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [SDC] configure mediasearch A/B test (duration: 01m 08s)
* 11:08 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:57 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:54 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:52 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:51 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:49 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:47 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:42 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:38 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:36 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:35 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:30 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:29 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:14 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:56 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:55 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 09:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:48 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:45 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 09:42 jynus: updating compiler facts for cloud puppet compiler project to include new host dbprov2003
* 09:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:29 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:28 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 09:22 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 09:18 _joe_: running a full apt-get upgrade on mw1379-1380
* 09:18 _joe_: re-upgrading imagemagick on mw1378
* 09:16 _joe_: upgrading packages on mw1377
* 09:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:06 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:05 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:25 jayme: forcing a puppet run on all mw-api servers in eqiad - [[phab:T260329|T260329]]
* 07:52 _joe_: repooling mw1382
* 07:37 _joe_: running the same test on mw1382 [[phab:T260329|T260329]]
* 07:34 _joe_: repooling mw1381
* 07:15 _joe_: running the same test on mw1381 [[phab:T260329|T260329]]
* 07:15 _joe_: repooled mw1281
* 06:26 _joe_: stop testing on mw1281, [[phab:T260329|T260329]]
* 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:28 marostegui: Stop mysql on db1099:3311, db1099:3318 for reimage
* 05:28 _joe_: depooling mw1281 for testing for [[phab:T260329|T260329]]
* 05:25 marostegui: Deploy schema change on db1139:3311
* 05:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311, db1099:3318 for reimage and MCR change', diff saved to https://phabricator.wikimedia.org/P12263 and previous config saved to /var/cache/conftool/dbconfig/20200817-052147-marostegui.json


== 2020-08-16 ==
== 2021-07-23 ==
* 11:12 gehel: repooling wdqs1004 - catched up on lag
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:15 effie: enable puppet on mc-gp* hosts
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]


== 2020-08-15 ==
== 2021-07-22 ==
* 21:18 gehel: depooling wdqs1004 and restarting services, will wait to catch up on lag before repooling
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010Β  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-08-14 ==
== 2021-07-21 ==
* 19:41 effie: restart mwdebug1002
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 16:58 cdanis: done deploying 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8' to all routers [[phab:T260449|T260449]]
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 16:44 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•§β˜• homer 'cr2-esams*'Β  commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 16:39 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•§β˜• homer 'cr1-codfw*'Β  commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 16:36 cdanis: ❌cdanis@cumin1001.eqiad.wmnet ~ πŸ•§β˜• homer 'cr2-codfw*'Β  commit 'allow nameservers of Jio AS55836 to skip RPKI validation I9fcff8'
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 02:41 eileen: tools revision changed from {{Gerrit|9a89f45974}} to {{Gerrit|b4ebd1e564}}
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 20:27 dancy: testing upcoming Scap release on beta
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 10:14 effie: enable puppet on mw* servers
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 08:17 effie: enable puppet on alert*
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 07:16 godog: powercycle ms-be2048
* 07:03 moritzm: installing systemd security updates on stretch
* 06:51 effie: restart memcached on eqiad mc* hosts
* 06:51 effie: enable puppet on mc* hosts
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .


== 2020-08-13 ==
== 2021-07-20 ==
* 23:39 tzatziki: removing 3 files for legal compliance
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 22:03 mutante: switching xhgui from tungsten to xhgui1001 - ran puppet on webperf*001 - [[phab:T180761|T180761]] [[phab:T158837|T158837]]
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 21:54 andrew@deploy1001: Finished deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug [[phab:T260388|T260388]] (duration: 03m 53s)
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 21:50 andrew@deploy1001: Started deploy [horizon/deploy@f3dcb29]: fix proxy in project-local domain --bug [[phab:T260388|T260388]]
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 21:11 mutante: rsyncing /var/lib/jenkins from releases1001 to releases1002 and then all other releases* servers. 57GB, overwriting existing data from manual config ([[phab:T247652|T247652]])
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 20:53 kormat: dropping xhgui.xhgui on m2
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 19:35 thcipriani@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/DiscussionTools: [[gerrit:620030{{!}}Revert new reply API (again)]] [[phab:T259855|T259855]] (duration: 00m 57s)
* 17:06 rzl: enabled puppet on A:mw
* 18:06 herron: restarted ES on logstash1010
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 18:05 dpifke@deploy1001: Synchronized wmf-config/ProductionServices.php: Enabling new XHGui backend ([[phab:T180761|T180761]]) (duration: 00m 56s)
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 17:16 hnowlan: deployed ATS and varnish rules to route api.wikimedia.org
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 16:26 hnowlan: created api.wikimedia.org
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 15:49 hnowlan: moving api-gateway service to state production. critical set to false
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 15:41 herron: restart ES on logstash1012
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:56 fdans@deploy1001: Finished deploy [analytics/refinery@ba1a439]: Regular analytics weekly train (duration: 11m 34s)
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 14:45 ema: repool mw1382 with kernel memory accounting disabled [[phab:T260281|T260281]]
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:45 fdans@deploy1001: Started deploy [analytics/refinery@ba1a439]: Regular analytics weekly train
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 14:41 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 14:40 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 14:38 ema: reboot mw1382 with kernel memory accounting disabled [[phab:T260281|T260281]]
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 14:34 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 14:34 _joe_: rebooting mw1381 with a newer kernel, mw1383 as control with the old kernel [[phab:T260329|T260329]]
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 14:33 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 14:31 _joe_: installing kernel 4.19.0-0.bpo.9 on mw1381 [[phab:T260329|T260329]]
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 14:00 elukey: create schema[12]00[34] in ganeti - [[phab:T260347|T260347]]
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 13:59 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 13:58 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 13:53 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 13:46 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 13:45 hnowlan: moving api-gateway service to monitoring_setup
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 13:44 hashar: Gracefully restarting Zuul
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 13:39 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 13:10 _joe_: forcing a puppet run on the api appservers in eqiadΒ  [[phab:T260329|T260329]]
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 13:07 oblivian@deploy1001: Synchronized wmf-config/CommonSettings.php: revert enabling of lilypond (again) [[phab:T257091|T257091]] [[phab:T260329|T260329]] (duration: 00m 59s)
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 11:24 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 11:20 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 11:09 hnowlan: restarting pybal on lvs2010 [[phab:T254908|T254908]]
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 11:09 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 11:06 hnowlan: restarting pybal on lvs2009 [[phab:T254908|T254908]]
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 11:05 hnowlan: restarting pybal on lvs1016 [[phab:T254908|T254908]]
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 11:05 jayme: depool mw1380 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - [[phab:T260329|T260329]]
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 11:05 hnowlan: restarting pybal on lvs1015 [[phab:T254908|T254908]]
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 11:04 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 10:42 hnowlan: Moving api-gateway service to from service_setup to lvs_setup and running puppet on LVS servers
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:17 jayme: depool mw1379 for downgrade of poppler-utils,libpoppler-glib8,libpoppler64,curl,libcurl3,libcurl3-gnutls,libpython3.5,python3.5,libpython3.5-stdlib,python3.5-minimal,libpython3.5-minimal,imagemagick-6-common,libmagickcore-6.q16-3,libmagickwand-6.q16-3,imagemagick-6.q16,imagemagick,e2fslibs,e2fsprogs,libcomerr2,libss2 and reboot - [[phab:T260329|T260329]]
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 10:04 XioNoX: re-order OSPF interfaces on all routers (now partially netbox driven)
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 09:37 ayounsi@deploy1001: Finished deploy [homer/deploy@89636df]: Homer release v0.2.5 (duration: 03m 03s)
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 09:34 ayounsi@deploy1001: Started deploy [homer/deploy@89636df]: Homer release v0.2.5
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P12247 and previous config saved to /var/cache/conftool/dbconfig/20200813-085547-marostegui.json
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:45 _joe_: downgrading imagemagick on mw1378 [[phab:T260329|T260329]]
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 08:43 _joe_: downgrading imagemagick on mw1378 [[phab:T260281|T260281]]
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 08:38 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 07:55 _joe_: downgrading curl/libcurl3/libcurl3-gnutls on mw1377 [[phab:T260329|T260329]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12246 and previous config saved to /var/cache/conftool/dbconfig/20200813-074528-marostegui.json
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P12244 and previous config saved to /var/cache/conftool/dbconfig/20200813-071943-marostegui.json
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 07:16 marostegui: Stop replication on db1082 to remove triggers on sanitarium for MCR changs
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P12243 and previous config saved to /var/cache/conftool/dbconfig/20200813-071545-marostegui.json
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 06:48 marostegui: Deploy MCR change on dbstore1003:3311
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1126', diff saved to https://phabricator.wikimedia.org/P12242 and previous config saved to /var/cache/conftool/dbconfig/20200813-060135-marostegui.json
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 05:43 marostegui: Stop MySQL on db2135 (codfw master), haproxy irc alert will fire [[phab:T260324|T260324]]
* 12:44 moritzm: installing systemd security updates on buster
* 05:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12241 and previous config saved to /var/cache/conftool/dbconfig/20200813-052859-marostegui.json
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12240 and previous config saved to /var/cache/conftool/dbconfig/20200813-051222-marostegui.json
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P12239 and previous config saved to /var/cache/conftool/dbconfig/20200813-050107-marostegui.json
* 11:58 Lucas_WMDE: EU config+backport window done
* 02:56 mutante: testreduce1001 - systemctl reset-failed ; fix parsoid-vd systemd state and icinga alert
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 00:37 mutante: removing jenkins_service_running checks from secondary servers where it's stopped, manually from icinga config, running puppet on icinga
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 00:14 mutante: re-enabling puppet on releases* servers
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}


== 2020-08-12 ==
== 2021-07-19 ==
* 23:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 23:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 23:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 23:37 wkandek: reboot mw1372
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 23:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 23:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 23:32 wkandek: reboot mw1373
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 23:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 23:31 wkandek: reboot mw1371
* 18:46 brennen: gerrit1001: restarting gerrit
* 23:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:40 vgutierrez: stop pybal on lvs2009Β  - [[phab:T286921|T286921]]
* 23:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 23:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 23:28 wkandek: reboot mw1384
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 23:27 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 23:27 wkandek: reboot mw1385
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 23:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 23:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 23:24 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 23:22 wkandek: reboot mw1370
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 23:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 23:19 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 23:18 wkandek: reboot mw1369
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 23:18 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 23:17 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 23:17 wkandek: reboot mw1387
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 23:16 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 23:16 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 23:16 wkandek: reboot mw1389
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 23:15 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 23:14 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 23:09 wkandek: reboot mw1368
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 23:09 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 23:08 wkandek: reboot me1367
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 23:08 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 23:07 wkandek: reboot mw1391
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 23:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 23:06 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 23:05 ejegg: updated Fundraising CiviCRM from {{Gerrit|72452e28a9}} to {{Gerrit|f5469d0a4c}}
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 23:05 wkandek: reboot mw1393
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 23:04 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 23:04 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:23 volans: running authdns-update to force-update authdns2001
* 23:01 wkandek: reboot mw1395
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 23:01 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:00 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 22:53 wkandek: reboot mw1397
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 22:53 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:52 wkandek: reboot mw1366
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 22:52 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 22:52 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 22:52 wkandek: reboot me1365
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 22:51 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 22:51 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 22:47 wkandek: reboot mw1399
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 22:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 22:46 wkandek: reboot mw1364
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 22:46 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 22:45 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 22:44 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 22:42 wkandek: reboot mw1401
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 22:42 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 22:41 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 22:41 wkandek: reboot mw1355
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 22:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 22:40 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 22:38 wkandek: reboot mw1354
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 22:38 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:36 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 22:36 wkandek: reboot mw1396
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 22:36 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 22:35 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 22:32 wkandek: reboot mw1353
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 22:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 22:31 wkandek: reboot mw1352
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 22:31 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 22:31 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 22:30 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 22:29 wkandek: reboot mw1348
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 22:29 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 22:28 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 22:26 wkandek: reboot 1347
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:26 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 22:25 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 22:23 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:22 wkandek: reboot mw1350
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 22:22 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:10 godog: +100G to prometheus/ops in codfw
* 22:21 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 22:20 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 22:19 wkandek: reboot mw1346
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 22:19 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 22:18 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 22:14 wkandek: reboot mw1345
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 22:13 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 22:12 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 22:12 wkandek: reboot mw1349
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 22:12 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 22:11 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 22:08 wkandek: reboot mw1333
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 22:07 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:07 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1330.eqiad.wmnet
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 22:03 wkandek: reboot mw1344
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 22:03 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 22:02 wkandek: reboot mw1343
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 22:02 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 22:02 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 22:00 wkandek: reboot mw1332
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 22:00 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 21:56 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 21:55 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 21:53 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 21:50 wkandek: reboot mw1331
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 21:50 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 21:48 wkandek: reboot mw1342
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 21:47 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 21:46 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:40 moritzm: installing bluez security updates
* 21:46 wkandek@cumin1001: conftool action : set/pooled=yes; selector: name=mw1340.eqiad.wmnet
* 11:31 Lucas_WMDE: EU backport+config window done
* 21:41 wkandek@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 21:40 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 21:39 wkandek@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 21:39 wkandek: reboot mw1341
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 21:39 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 21:37 wkandek@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 21:37 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 08:15 vgutierrez: depool codfw text traffic
* 21:36 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 21:33 wkandek: reboot mw1329
* 03:26 twentyafterfour: restarted phd on phab1001
* 21:33 wkandek: reboot mw1328
* 03:25 twentyafterfour: investigating PHD failure
* 21:32 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:32 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:29 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:28 ejegg: updated payments-wiki from {{Gerrit|77ff5d70fc}} to {{Gerrit|a7ee1790e0}}
* 21:25 wkandek: reboot mw1340
* 21:25 wkandek@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:21 wkandek: reboot mw1339
* 21:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:20 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:15 wkandek: reboot mw1327
* 21:15 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:13 wkandek: reboot mw1326
* 21:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:11 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:11 wkandek: reboot mw1317
* 21:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:10 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:05 wkandek: reboot mw1316
* 21:04 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:03 wkandek: reboot mw1325
* 21:03 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:02 wkandek: reboot mw1324
* 21:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:02 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 21:01 wkandek: reboot mw1315
* 21:01 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 21:00 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:57 wkandek: reboot mw1323
* 20:57 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:52 wkandek: reboot mw1322
* 20:52 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:51 wkandek: reboot mw1314
* 20:51 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:50 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:50 wkandek: reboot mw1313
* 20:50 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:48 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:44 wkandek: reboot mw1312
* 20:44 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:43 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:43 wkandek: reboot mw1321
* 20:42 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:41 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:40 wkandek: reboot mw1297
* 20:40 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:39 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:39 wkandek: reboot mw1320
* 20:39 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:34 wkandek: reboot mw1290
* 20:34 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:33 wkandek: reboot mw1319
* 20:33 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:32 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:29 wkandek: reboot mw1275
* 20:29 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:28 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:26 wkandek: reboot mw1289
* 20:25 wkandek: reboot mw1288
* 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:25 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:24 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:23 wkandek: reboot mw1274
* 20:23 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:23 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:20 wkandek: reboot mw1273
* 20:20 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:13 wkandek: reboot mw1287
* 20:13 wkandek: reboot mw1286
* 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:13 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:11 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 20:11 wkandek: reboot mw1272
* 20:11 wkandek: reboot mw1271
* 19:41 hashar: Upgrading Jenkins on contint2001 (primary)
* 19:25 hashar: contint1001: sudo systemctl mask jenkinsΒ  # spare server
* 19:25 mutante: all releases* servers except 1001 - disable puppet; stop jenkins, mask jenkins
* 19:22 mutante: releases1002 - stopped and masked jenkins service
* 19:22 mutante: releases2001 - stopped and masked jenkins service
* 19:20 mutante: upgrading jenkins on releases*001
* 19:19 hashar: Upgrading Jenkins on contint1001 (spare)
* 19:16 hashar@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.36.0-wmf.4
* 19:13 mutante: uploade new jenkins version to APT repo; upgrading jenkins on releases1002/2002
* 19:08 effie: pool mw1396
* 19:06 effie: repool mw1395 mw1397 mw1399
* 18:56 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:55 root@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:50 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/client/includes/Store/Sql/DirectSqlStore.php: [[phab:T255305{{!}}Set caching of CachingEntityRevisionLookup to CACHE_NONE in client]] (duration: 02m 13s)
* 18:47 wkandek: reboot mw1270
* 18:47 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:45 wkandek: reboot mw1269
* 18:41 root@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:39 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:38 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:25 wkandek: reboot mw1268
* 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:25 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:22 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:17 jiji@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97)
* 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 18:16 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:10 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments on hewiki ([[phab:T255020|T255020]]) (duration: 01m 03s)
* 18:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 18:07 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 18:06 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 18:04 ladsgroup@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/Wikibase/repo/includes/Store/Sql/SqlStore.php: [[phab:T255305{{!}}Set caching of CachingEntityRevisionLookup to CACHE_NONE in repo]] (duration: 01m 06s)
* 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 18:02 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 18:00 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:56 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:52 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:52 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 17:51 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:51 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:50 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:49 effie: reboot mw1265 mw1282 mw1283
* 17:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:45 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:37 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 17:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 17:30 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:20 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 17:19 effie: reboot mw1263 mw1264 mw1279 and mw1281
* 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:17 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:16 cdanis: for posterity: mw1359 has a bunch of special packages installed (previously recorded in SAL) and also has `sudo memleak-bpfcc -o 60000 -z 31 -Z 33 30` running in a tmux in an attempt to understand what's causing the page fragmentation in the appserver fleet
* 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 17:16 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:15 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 17:13 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 17:00 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 16:57 sbassett@deploy1001: Synchronized private/PrivateSettings.php: Additional mitigations for [[phab:T257687|T257687]] (duration: 01m 03s)
* 16:53 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:52 hnowlan@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 16:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:50 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:48 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:44 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:35 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 16:35 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:32 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:31 effie: reboot mw1277 mw1278 && mw1261 mw1262
* 16:29 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 16:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 16:04 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:04 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I3726a6364d}}, [[phab:T257079|T257079]] (duration: 01m 02s)
* 15:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:52 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:50 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:48 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:48 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:37 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:32 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:32 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:26 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:22 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:21 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:19 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:15 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:12 cdanis: βœ”οΈ cdanis@mw1359.eqiad.wmnet ~ πŸ•šβ˜• sudo apt install linux-headers-4.9.0-12-amd64
* 15:10 cdanis: βœ”οΈ cdanis@mw1359.eqiad.wmnet ~ πŸ•šβ˜• sudo apt install python3-netaddr ieee-data
* 15:09 cdanis: βœ”οΈ cdanis@mw1359.eqiad.wmnet ~ πŸ•šβ˜• sudo dpkg -i bpfcc-tools_0.12.0-2_all.deb libbpfcc_0.12.0-2_amd64.deb python3-bpfcc_0.12.0-2_all.deb
* 15:08 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 15:03 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 15:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:54 cdanis: again un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
* 14:53 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 14:52 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:45 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:44 cdanis: temporarily re-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
* 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:37 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 14:35 cdanis: un-kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports
* 14:32 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:31 cdanis: temporarily kludging deneb.codfw.wmnet:/var/cache/pbuilder/hooks/stretch/D02backports, original in my homedir
* 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:24 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:02 kormat: uploaded wmfmariadbpy 0.3 to apt
* 13:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:48 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:42 effie: restart mw1383 & mw1386
* 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:41 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:27 hashar@deploy1001: Synchronized php: group1 wikis to 1.36.0-wmf.4 (duration: 01m 16s)
* 13:25 hashar@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.36.0-wmf.4
* 13:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:19 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:15 cdanis: βœ”οΈ cdanis@mw1357.eqiad.wmnet ~ πŸ•˜β˜• sudo sysctl -w vm/compact_memory=1
* 13:12 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 13:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 13:04 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:59 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:52 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:50 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:33 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1)
* 12:27 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:20 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:16 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 12:15 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 12:15 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:51 ema: pool mw1363 after reboot
* 11:49 jynus: creating artificial low replication lag on db2130 to test icinga alerts [[phab:T253120|T253120]]
* 11:41 ema@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:37 ema@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:30 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:28 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:25 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:24 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:21 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:17 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:13 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:08 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:07 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 11:07 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 11:00 oblivian@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 11:00 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single
* 10:55 _joe_: rebooting mw1361
* 10:51 jayme: rebooting mw1356
* 10:49 _joe_: rebooting mw1378
* 09:45 _joe_: repooling mw1377
* 09:40 _joe_: rebooting mw1377
* 09:22 _joe_: depool mw1357 tool
* 09:14 _joe_: depooling mw1377 for inspection
* 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1110', diff saved to https://phabricator.wikimedia.org/P12220 and previous config saved to /var/cache/conftool/dbconfig/20200812-091211-marostegui.json
* 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12219 and previous config saved to /var/cache/conftool/dbconfig/20200812-090831-marostegui.json
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12218 and previous config saved to /var/cache/conftool/dbconfig/20200812-085021-marostegui.json
* 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P12217 and previous config saved to /var/cache/conftool/dbconfig/20200812-083548-marostegui.json
* 07:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for reimage', diff saved to https://phabricator.wikimedia.org/P12215 and previous config saved to /var/cache/conftool/dbconfig/20200812-073130-marostegui.json
* 04:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 for MCR change', diff saved to https://phabricator.wikimedia.org/P12214 and previous config saved to /var/cache/conftool/dbconfig/20200812-045157-marostegui.json


== 2020-08-11 ==
== 2021-07-16 ==
* 23:41 Urbanecm: Evening B&C window completed
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:39 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0f238f71c95c7bd7534c28abfac759fbb47f674f}}: Update wgMFRemovableClasses ([[phab:T231160|T231160]]) (duration: 01m 03s)
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:36 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/MobileFrontend/extension.json: {{Gerrit|c22d65ff9b2439f484ab8ccffed87b00e78c3ad2}}: Hide vertical nav-boxes on mobile domain ([[phab:T231160|T231160]]) (duration: 01m 03s)
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:34 urbanecm@deploy1001: Synchronized php-1.36.0-wmf.4/extensions/MobileFrontend/extension.json: {{Gerrit|81d54b0ec82d0b78f723f9400031e918a4a143aa}}: Hide vertical nav-boxes on mobile domain ([[phab:T231160|T231160]]) (duration: 01m 05s)
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|28faa279dacf6a4d6f0a663844e913738c2fa142}}: Switching to updated license definition (duration: 01m 04s)
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 21:52 krinkle@deploy1001: Synchronized php-1.36.0-wmf.3/includes/skins/SkinMustache.php: {{Gerrit|Ibe1f07346}}, [[phab:T259872|T259872]], [[phab:T259858|T259858]] (duration: 01m 04s)
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 19:40 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 19:35 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Add streams for eventgate-main - [[phab:T251935|T251935]] (duration: 01m 04s)
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 19:21 ejegg: updated payments-wiki from {{Gerrit|f199c071c3}} to {{Gerrit|77ff5d70fc}}
* 15:48 vgutierrez: restart pybal on lvs2010
* 18:55 otto@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:48 otto@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 18:44 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 18:28 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Grant investigate right to checkuser group on frwiki ([[phab:T260171|T260171]]) (duration: 01m 04s)
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: Beta-only: Configured additional settings for API Portal beta wiki gerrit:619339 (duration: 01m 03s)
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 18:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Direct GrowthExperiments help panel questions to mentors on cswiki ([[phab:T250235|T250235]]) (duration: 01m 03s)
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 17:56 otto@deploy1001: Synchronized wmf-config/InitialiseSettings.php: EventStreamConfig - Remove extraneous mediawiki.api-request stream - [[phab:T251935|T251935]] (duration: 01m 01s)
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
* 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:53 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
* 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
* 17:43 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
* 17:38 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
* 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
* 17:33 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 17:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 17:28 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 17:25 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
* 17:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
* 16:58 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers ([[phab:T279309|T279309]])
* 16:53 hashar@deploy1001: Synchronized php-1.36.0-wmf.4/skins/MinervaNeue/: Revert "ServiceWiring: Avoid usage of deprecated Title::getSubjectPage()" - [[phab:T260155|T260155]] (duration: 01m 06s)
* 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 16:12 herron: migrating lists.wikimedia.org services from fermium to lists1001 [[phab:T224586|T224586]]
* 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
* 15:36 hashar@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.4
* 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 15:27 hashar@deploy1001: Finished scap: (no justification provided) (duration: 30m 51s)
* 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
* 14:59 marostegui: Deploy MCR change on db1116:3318
* 12:39 mutante: mw1412 through mw1428 - set to active in netbox ([[phab:T279309|T279309]])
* 14:56 hashar@deploy1001: Started scap: (no justification provided)
* 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:56 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.2 (duration: 04m 15s)
* 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
* 14:55 jayme: updated helmfile to 0.125.2-1 on contint* and deploy*
* 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
* 14:52 otto@deploy1001: Finished deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - [[phab:T251935|T251935]] (duration: 01m 14s)
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
* 14:51 otto@deploy1001: Started deploy [analytics/refinery@35c4430]: Deploying to an-launcher1002 to get camus wrapper script changes - [[phab:T251935|T251935]]
* 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
* 14:50 hashar@deploy1001: Pruned MediaWiki: 1.36.0-wmf.1 (duration: 02m 07s)
* 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:48 jayme: imported helmfile_0.125.2-1 to buster-wikimedia, jessie-wikimedia, stretch-wikimedia
* 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
* 14:47 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.41 (duration: 04m 20s)
* 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
* 14:40 hashar@deploy1001: Pruned MediaWiki: 1.35.0-wmf.40 (duration: 10m 24s)
* 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
* 14:37 papaul: replacing msw-b5,b6,b7 and b8
* 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
* 14:30 hashar: Cleaning old MediaWiki versions that were never removed
* 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
* 14:27 hashar@deploy1001: sync aborted: testwikis wikis to 1.36.0-wmf.4 (duration: 72m 36s)
* 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 14:10 hashar: mw1319: scap pull
* 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
* 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:27 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:23 oblivian@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
* 13:16 oblivian@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:14 hashar@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.4
* 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:12 hashar: Applied 1.36.0-wmf.4 security patches # [[phab:T257972|T257972]]
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
* 13:03 jayme@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
* 12:52 kormat: uploaded wmfmariadbpy 0.2 packages to apt1001
* 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
* 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:36 oblivian@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:54 marostegui: Install new MariaDB 10.4.14 on db2102
* 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:42 hnowlan@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
* 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 11:30 hnowlan@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
* 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
* 11:18 Urbanecm: EU B&C window done
* 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied [[phab:T132839|T132839]] workarounds)
* 11:08 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit{{!}}619255{{!}}Enable ContentTranslation in Sundanese WP as a default tool (T258502)]] (duration: 00m 59s)
* 10:39 volans: migrating *all* eqiad mgmt DNS records to the autogenerated ones via Netbox - [[phab:T233183|T233183]]
* 10:38 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 10:34 volans@cumin1001: START - Cookbook sre.dns.netbox
* 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro-from-cdh (exit_code=0)
* 10:01 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro-from-cdh
* 10:00 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 09:51 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 09:29 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:25 volans@cumin1001: START - Cookbook sre.dns.netbox
* 09:11 marostegui: Rename tables on muswiki and mhwiktionary on s3 master (db1123) without replication [[phab:T260112|T260112]]
* 09:01 volans: renewed puppet certificate on scb1001.eqiad.wmnet
* 08:52 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6ec237b6b6fb67a0a80613909589bc724f5eecf}}: Revert "Turn muswiki and mhwiktionary to read-only" ([[phab:T259004|T259004]]) (duration: 00m 58s)
* 08:45 urbanecm@deploy1001: Synchronized dblists/: {{Gerrit|81f4594b6c583f938821549b3a1800fec5b120bb}}: Point muswiki and mhwiktionary to s5 ([[phab:T259004|T259004]]; 3/3) (duration: 00m 58s)
* 08:44 urbanecm@deploy1001: Synchronized wmf-config/db-eqiad.php: {{Gerrit|81f4594b6c583f938821549b3a1800fec5b120bb}}: Point muswiki and mhwiktionary to s5 ([[phab:T259004|T259004]]; 2/3) (duration: 00m 58s)
* 08:43 urbanecm@deploy1001: Synchronized wmf-config/db-codfw.php: {{Gerrit|81f4594b6c583f938821549b3a1800fec5b120bb}}: Point muswiki and mhwiktionary to s5 ([[phab:T259004|T259004]]; 1/3) (duration: 01m 02s)
* 08:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a04bc1f27e4ef4e38002d546d30bfd2d1dc60d0e}}: Turn muswiki and mhwiktionary to read-only ([[phab:T259004|T259004]]) (duration: 01m 01s)
* 08:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
* 06:54 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
* 06:45 XioNoX: Re-prioritize peering over transit eqiad/esams - [[phab:T259614|T259614]]
* 01:59 tstarling@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: enabling fast stale mode [[phab:T250248|T250248]] (duration: 00m 58s)
* 00:33 dpifke@deploy1001: Finished deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix [[phab:T259167|T259167]] (duration: 01m 03s)
* 00:31 dpifke@deploy1001: Started deploy [performance/arc-lamp@fc5f1c6]: Deploying latest attempt to fix [[phab:T259167|T259167]]
* 00:24 mutante: reverting switch of releases.wikimedia.org for today since releases-jenkins.wikimedia.org is tied to it and new jenkins still needs some config and plugins ([[phab:T247652|T247652]])
* 00:08 mutante: releases-jenkins.wikimedia.org currently under maintenance ([[phab:T247652|T247652]])


== 2020-08-10 ==
== 2021-07-15 ==
* 23:56 eileen: tools revision changed from {{Gerrit|22550f38c5}}
* 23:32 brennen: checking stashbot: [[phab:T286756|T286756]]
* 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: [[gerrit:704815{{!}}Fix creation of mw.Message objects (T286385)]] (duration: 00m 57s)
* 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # [[phab:T285811|T285811]]
* 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # [[phab:T285811|T285811]]
* 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # [[phab:T285811|T285811]]
* 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
* 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki [[phab:T284928|T284928]]
* 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
* 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
* 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
* 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
* 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} β€œVisual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
* 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: {{Gerrit|eebdc4d}} β€œVisual style – Icons”: Add Figma colors & icons file as source of truth (#484)
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 06s)
* 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: [[phab:T286611|T286611]] (duration: 01m 07s)
* 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
* 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
* 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
* 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
* 16:40 ejegg: updated payments-wiki from {{Gerrit|d9892207c1}} to {{Gerrit|844b59ee42}}
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 16:27 ejegg: updated fundraising CiviCRM from {{Gerrit|e0d53c92b5}} to {{Gerrit|20e9ef6bbb}}
* 16:24 ejegg: updated payments-wiki from {{Gerrit|0e7800027a}} to {{Gerrit|844b59ee42}}
* 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
* 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: [[gerrit:704773{{!}}flaggedrevs: Allow admins of idwiki to change stablesettings (T268317)]], try II (duration: 01m 05s)
* 15:03 Amir1: temporary becoming admin on idwiki to debug [[phab:T268317|T268317]]
* 15:02 moritzm: installing nginx security updates on ms-fe*
* 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
* 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
* 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade [[phab:T273281|T273281]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]{{!}}10).codfw.wmnet
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
* 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
* 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
* 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
* 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
* 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
* 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
* 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
* 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
* 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
* 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
* 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for [[phab:T273281|T273281]]
* 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
* 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
* 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
* 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
* 12:51 dzahn@cumin1001: conftool action : set
* 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:14 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:12 volans@
* 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
* 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 15:33 moritzm: installing graphviz security updates on buster
* 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
* 15:30 _joe_: test
* 15:23 moritzm: installing graphviz security updates on buster
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
* 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
* 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
* 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - [[phab:T283196|T283196]] (duration: 01m 05s)
* 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames [[phab:T283167|T283167]] (duration: 01m 05s)
* 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
* 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
* 12:30 kormat: Deploying wmfmariadbpy 0.7 [[phab:T283228|T283228]]
* 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
* 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (


== 2020-08-09 ==
== 2021-05-08 ==
* 21:58 ejegg: updated payments-wiki from {{Gerrit|cd012f37f1}} to {{Gerrit|932aacde54}}
* 17:18 Amir1: starting upgrade of batch G of mailing lists ([[phab:T280322|T280322]])
* 03:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)


== 2020-08-08 ==
== 2021-05-07 ==
* 02:23 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:40 legoktm: deleted education@ from MM3, didn't import properly
* 02:21 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97)
* 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
* 02:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-reload
* 21:33 legoktm: fixed owner for wdqs-gui-build list
* 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
* 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
* 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
* 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
* 18:23 brennen: 1.37.0-wmf.4 train status ([[phab:T281145|T281145]]): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
* 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: [[gerrit:685901{{!}}LinkBatch: skip bad input (T282180 T282070)]] (duration: 01m 06s)
* 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
* 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
* 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
* 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
* 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
* 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
* 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
* 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
* 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
* 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
* 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
* 13:04 Urbanecm: Start server-side upload for 1 video file ([[phab:T281927|T281927]])
* 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
* 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
* 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
* 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
* 09:55 dcausse: depooling wdqs1012 [[phab:T280382|T280382]], [[phab:T282222|T282222]]
* 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - [[phab:T281673|T281673]]
* 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
* 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
* 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - [[phab:T281673|T281673]]
* 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
* 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
* 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
* 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
* 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear [[phab:T266486|T266486]] [[phab:T268392|T268392]] [[phab:T273360|T273360]]
* 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 10s)
* 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN [[phab:T282122|T282122]] (duration: 01m 06s)
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
* 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
* 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T282093|T282093]]', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
* 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
* 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json


== 2020-08-07 ==
== 2021-05-06 ==
* 16:42 jforrester@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/DiscussionTools/: [[phab:T259855|T259855]] Revert new reply API (duration: 01m 06s)
* 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 ([[phab:T282193|T282193]])
* 15:01 volans: import DNS names for network devices in Netbox - [[phab:T258729|T258729]]
* 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 ([[phab:T282092|T282092]])
* 13:27 godog: bounce pybal on lvs1016 and then lvs1015 to reset state, logstash1025 reported down but actually up
* 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: [[gerrit:685890{{!}}Reorder tables in SpecialWatchlist (T282181)]] (duration: 00m 57s)
* 10:27 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 ([[phab:T282092|T282092]])
* 10:27 sukhe@cumin1001: START - Cookbook sre.hosts.downtime
* 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o ([[phab:T282092|T282092]])
* 10:02 elukey: reboot deneb via ganeti2021 (hostname config pointing to recdns for some reason)
* 21:11 hashar: restarted CI Jenkins due to [[phab:T281737|T281737]]
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P12195 and previous config saved to /var/cache/conftool/dbconfig/20200807-091527-marostegui.json
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12194 and previous config saved to /var/cache/conftool/dbconfig/20200807-084747-marostegui.json
* 19:04 ejegg: updated fundraising CiviCRM from {{Gerrit|8034e47008}} to {{Gerrit|2052d79248}}
* 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12193 and previous config saved to /var/cache/conftool/dbconfig/20200807-080719-marostegui.json
* 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685906{{!}}Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140)]] (duration: 01m 04s)
* 07:50 godog: prometheus codfw lvextend --resize --size +60G /dev/mapper/vg--hdd-prometheus--global
* 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|338d1df5903cdc963b9eef22ec2c1750b7b3a02b}}: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 05s)
* 07:49 godog: prometheus codfw lvextend --resize --size +30G /dev/mapper/vg--ssd-prometheus--k8s
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: {{Gerrit|7e21cf0d96541d0ab5cb18cd7741756ab1dfe7b8}}: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases ([[phab:T282160|T282160]]) (duration: 01m 04s)
* 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P12192 and previous config saved to /var/cache/conftool/dbconfig/20200807-074658-marostegui.json
* 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - [[phab:T282140|T282140]] (duration: 01m 06s)
* 06:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
* 06:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for upgrade', diff saved to https://phabricator.wikimedia.org/P12191 and previous config saved to /var/cache/conftool/dbconfig/20200807-063431-marostegui.json
* 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
* 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
* 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
* 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:15 volans: upgrade spicerack on cumin* to 0.0.52
* 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 17:13 papaul: powerdown ms-be2057 for relocation
* 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
* 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 17:00 papaul: powerdown elastic2058 for relocation
* 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - [[phab:T281673|T281673]]
* 16:12 papaul: powerdown mc-gp2002 for relocation
* 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
* 15:58 Amir1: starting upgrade of public mailing lists in group d and e ([[phab:T280322|T280322]])
* 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
* 15:42 papaul: powerdown logstash2027 for relocation
* 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
* 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
* 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:32 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
* 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
* 15:26 ryankemper: [[phab:T280382|T280382]] [WDQS] Pooled `wdqs1007` and `wdqs2004`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2Β  Β  Β  Β  2.6TΒ  998GΒ  1.5TΒ  40% /srv`
* 15:26 ryankemper: [[phab:T280382|T280382]] `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2Β  Β  Β  Β  2.6TΒ  998GΒ  1.5TΒ  40% /srv`
* 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 15:14 papaul: powerdown ms-be2053 for relocation
* 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: [[phab:T270704|T270704]]
* 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
* 14:55 papaul: powerdown kafka-main2002 for relocation
* 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
* 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
* 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
* 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
* 13:21 XioNoX: push pfw policies - [[phab:T281942|T281942]]
* 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
* 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
* 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
* 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 01m 06s)
* 11:34 mlitn@deploy1002: sync-file aborted: Config: [[gerrit:685752{{!}}Enable Extension:MediaSearch on betacommons (T265939)]] (duration: 00m 56s)
* 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
* 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
* 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
* 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:685554{{!}}Enable ReferencePreviews as full default on pilot wikis (T271206)]] (duration: 01m 06s)
* 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster [[phab:T280751|T280751]]', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
* 11:12 kormat: reimaging db1173 to buster [[phab:T280751|T280751]]
* 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
* 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
* 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
* 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
* 10:19 jynus: stop dbprov2002 in advance of maintenance [[phab:T281135|T281135]]
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
* 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
* 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
* 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
* 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
* 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
* 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
* 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
* 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
* 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
* 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye [[phab:T275873|T275873]]
* 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
* 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
* 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
* 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
* 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
* 07:47 jynus: shutting down and removing db2098:s3 instance
* 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
* 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
* 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - [[phab:T281673|T281673]]
* 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 07:24 moritzm: installing exim security updates on bullseye hosts
* 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
* 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
* 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
* 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
* 06:01 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
* 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
* 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 [[phab:T281445|T281445]]', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
* 05:38 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
* 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing [[phab:T282070|T282070]]Β  RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
* 05:27 effie: upgrade scap to 3.17.1-1 - [[phab:T279695|T279695]]
* 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
* 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:38 ryankemper: [[phab:T280382|T280382]] `sudo -i wmf-auto-reimage-host -p [[phab:T280382|T280382]] wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
* 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
* 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '<nowiki>{</nowiki>"transient":<nowiki>{</nowiki>"cluster.routing.allocation.exclude":<nowiki>{</nowiki>"_host": null,"_name": null}'`}}
* 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
* 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
* 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
* 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - [[phab:T280563|T280563]]
* 00:35 Amir1: sudo service mailman3-web restart


== 2020-08-06 ==
== 2021-05-05 ==
* 23:21 catrope@deploy1001: Synchronized php-1.36.0-wmf.3/extensions/GrowthExperiments/: Fixes for WelcomeSurvey language question ([[phab:T232410|T232410]]) (duration: 00m 59s)
* 23:35 ryankemper: [[phab:T281621|T281621]] [[phab:T281327|T281327]] [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
* 23:04 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change GrowthExperiments mentor list on fawiki ([[phab:T253291|T253291]]) (duration: 00m 59s)
* 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: {{Gerrit|4947241f876234aabc578409c3691fb791c8f715}}: Fix centering of as-of label (duration: 01m 08s)
* 21:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions ([[phab:T281564|T281564]])
* 21:41 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 22:05 mutante: pushing puppet run on all bastion hosts
* 21:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) [[phab:T281309|T281309]]
* 21:39 mholloway-shell@deploy1001: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|52b134ed84c1c8ef5fcd6927f03567879553d31c}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:39 andrew@cumin1001: START - Cookbook sre.hosts.downtime
* 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: {{Gerrit|6526884848d0bb88c83cec2c6b39461542e21ef6}}: Cross-wiki block should pass correct wiki blocker ([[phab:T281972|T281972]]) (duration: 01m 08s)
* 21:35 mholloway-shell@deploy1001: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: {{Gerrit|f189c4627cfc692fb743160030a5e5ab92df1485}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 09s)
* 21:33 brennen@deploy1001: Synchronized php-1.36.0-wmf.3/vendor: [[gerrit:618850{{!}}Update git submodules (vendor)]] ([[phab:T259832|T259832]]) (duration: 01m 08s)
* 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 01m 11s)
* 21:32 mholloway-shell@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 21:29 urbanecm@deploy1002: sync-file aborted: {{Gerrit|8ffb52d5cad9e003696200b9cd3e957ab26bc868}}: UserIdentityValue: Introduce convenience static factory methods ([[phab:T281972|T281972]]) (duration: 00m 04s)
* 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 20:37 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:51 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 20:36 ejegg: updated payments-wiki from {{Gerrit|d449599540}} to {{Gerrit|9f51ace546}}
* 20:47 shdubsh: restart logstash -- pipeline appears stuck
* 20:20 ejegg: updated email preferences wiki (donorwiki) from {{Gerrit|a232fc3438}} to {{Gerrit|d449599540}}
* 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 19:59 jbond42: re-enable puppet post 685485
* 20:38 otto@deploy1001: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 19:53 jbond42: disa