You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "Server Admin Log"

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump cirrus MLR models to latest (duration: 01m 06s))
imported>Stashbot
(mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .)
(332 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2020-07-21 ==
== 2021-08-03 ==
* 23:37 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump cirrus MLR models to latest (duration: 01m 06s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:13 Urbanecm: Evening backport window done
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:12 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|7a50168d54b5e86834606fb8d7880eb3a923ffd5}}: Updating UploadWizard template: PD-old-70-1923->PD-old-70-expired ([[phab:T258523|T258523]]) (duration: 01m 06s)
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 23:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7acc9d966a07d589bb6aed5f801c9e1defc75fe1}}: Enable $wgWatchlistExpiry on testwiki ([[phab:T257506|T257506]]) (duration: 01m 08s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.36.0-wmf.1
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:02 catrope@deploy1001: Synchronized php-1.36.0-wmf.1/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 06s)
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 19:01 catrope@deploy1001: Synchronized php-1.35.0-wmf.41/includes/Storage/PageUpdater.php: Fix handling of null edits ([[phab:T257766|T257766]]) (duration: 01m 11s)
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 18:33 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.36.0-wmf.1 (duration: 41m 22s)
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 18:27 ejegg: restored new URL for TY page in payments-wiki settings
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 18:22 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 18:22 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1] (thin): Redeploying to unbreak unique devices per domain monthly THIN [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 18:21 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 12s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:21 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - third try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:17 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 00m 17s)
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly - second try [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 18:13 mforns@deploy1001: Finished deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd] (duration: 05m 32s)
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:08 mforns@deploy1001: Started deploy [analytics/refinery@0c25de1]: Redeploying to unbreak unique devices per domain monthly [analytics/refinery@0c25de19a3a309276654b4463cca4f574336d8fd]
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 17:52 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.36.0-wmf.1
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 17:50 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 17:45 volans@cumin1001: START - Cookbook sre.dns.netbox
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 17:10 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.39 (duration: 16m 25s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 16:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2 (duration: 04m 54s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 16:27 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase, take 2
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 16:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase (duration: 10m 37s)
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 16:21 longma: 1.36.0-wmf.1 was branched at {{Gerrit|3a1faac3764ecae8dde813bd67a5a8e8f4975a85}} for [[phab:T257969|T257969]]
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 16:16 ppchelko@deploy1001: Started deploy [restbase/deploy@4f3cb41]: Add new wikis to RESTBase
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 15:10 moritzm: draining restbase1027 for eventual reboot for kernel security update
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 15:09 godog: poweroff ms-be1024 for bbu replacement - [[phab:T257949|T257949]]
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:08 filippo@cumin1001: START - Cookbook sre.hosts.downtime
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 15:01 vgutierrez: show a synthetic warning for traffic using ECDHE-RSA-AES128-SHA - [[phab:T258405|T258405]]
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 15:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:00 moritzm: draining restbase1026 for eventual reboot for kernel security update
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 14:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 14:57 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 14:52 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 14:51 moritzm: draining restbase1025 for eventual reboot for kernel security update
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 14:44 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 14:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps,name=scb.*
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 14:35 akosiaris: decrease codfw mobileapps kubernetes traffic to 72% [[phab:T218733|T218733]]. Weird latency patterns exhibited when 92% was reached. See https://grafana.wikimedia.org/d/5CmeRcnMz/mobileapps?panelId=34&fullscreen&orgId=1&from=1595338489749&to=1595342071227&var-dc=codfw%20prometheus%2Fk8s&var-service=mobileapps&var-container_name=All
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 14:35 moritzm: draining restbase1024 for eventual reboot for kernel security update
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11994 and previous config saved to /var/cache/conftool/dbconfig/20200721-143204-marostegui.json
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11993 and previous config saved to /var/cache/conftool/dbconfig/20200721-142634-marostegui.json
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:24 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 14:19 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11992 and previous config saved to /var/cache/conftool/dbconfig/20200721-141813-marostegui.json
* 16:59 hashar: Gerrit has been upgraded
* 14:16 moritzm: draining restbase1023 for eventual reboot for kernel security update
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 14:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:45 hashar: Stopping Gerrit for upgrade
* 14:06 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 14:03 moritzm: draining restbase1022 for eventual reboot for kernel security update
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 14:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 13:57 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 13:55 moritzm: draining restbase1021 for eventual reboot for kernel security update
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11991 and previous config saved to /var/cache/conftool/dbconfig/20200721-135028-marostegui.json
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:46 moritzm: draining restbase1020 for eventual reboot for kernel security update
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:42 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:41 akosiaris: increase codfw mobileapps kubernetes traffic to 96% [[phab:T218733|T218733]]
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:41 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:41 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:15 Amir1: end of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 13:10 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 13:06 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 13:03 moritzm: draining restbase1019 for eventual reboot for kernel security update
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:01 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:55 Amir1: start of ladsgroup@mwmaint1002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T258472|T258472]] [[phab:T258473|T258473]])
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:54 marostegui: Stop haproxy on dbproxy1012 - [[phab:T255408|T255408]]
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P11988 and previous config saved to /var/cache/conftool/dbconfig/20200721-121302-marostegui.json
* 12:05 moritzm: installing libgcrypt20 security updates
* 12:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 12:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:45 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 Urbanecm: EU B&C window done
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|7b96c7ea35557888c6cec2dd19768c246bff804b}}: Enable botpasswords at checkuserwiki and stewardwiki ([[phab:T258358|T258358]], [[phab:T258355|T258355]]) (duration: 00m 57s)
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:11 Urbanecm: Create bot_passwords table at checkuserwiki ([[phab:T258358|T258358]])
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:10 Urbanecm: Create bot_passwords table at stewardwiki ([[phab:T258355|T258355]])
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d5bb37c342310be5ca0b0e11a8490703867f4fd}}: Enable Vector opt in preference everywhere ([[phab:T254228|T254228]]) (duration: 00m 57s)
* 09:18 marostegui: Failover m1, m2 and m3-master [[phab:T287574|T287574]]
* 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11987 and previous config saved to /var/cache/conftool/dbconfig/20200721-110854-marostegui.json
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 11:00 effie: enable puppet on  P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11986 and previous config saved to /var/cache/conftool/dbconfig/20200721-105852-marostegui.json
* 08:57 moritzm: installing pillow security updates on stretch
* 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085 [[phab:T258360|T258360]]', diff saved to https://phabricator.wikimedia.org/P11985 and previous config saved to /var/cache/conftool/dbconfig/20200721-104546-marostegui.json
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P11984 and previous config saved to /var/cache/conftool/dbconfig/20200721-103430-marostegui.json
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 10:20 effie: disable puppet on P:mediawiki::mcrouter_wancache - [[phab:T247956|T247956]]
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 10:13 effie: enable puppet on on wtp*
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 10:02 marostegui: Analyze revision table on db1119 [[phab:T258480|T258480]]
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 [[phab:T258480|T258480]]', diff saved to https://phabricator.wikimedia.org/P11983 and previous config saved to /var/cache/conftool/dbconfig/20200721-100159-marostegui.json
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb [[phab:T218733|T218733]]
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:59 akosiaris: move all codfw mobileapps nodes (kubernetes and scb) to weight 10. Traffic level remains at 72.727272% flowing to kubernetes, the rest to scb
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 09:59 effie: disable puppet on wtp* to merge 613307
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 09:58 akosiaris@cumin1001: conftool action : set/weight=10; selector: dc=codfw,service=mobileapps
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:58 akosiaris: increase codfw mobileapps kubernetes traffic to 72.727272% [[phab:T218733|T218733]]
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 09:57 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=scb.*
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 09:44 elukey: add term 'idp' to analytics-in4/6 filters on cr1-eqiad and cr2-eqiad (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/615160)
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 09:21 kormat@cumin1001: dbctl commit (dc=all): 'Re-pool es1020 at 25% in es4 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11982 and previous config saved to /var/cache/conftool/dbconfig/20200721-092126-kormat.json
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 08:37 akosiaris: increase codfw mobileapps kubernetes traffic to 47% [[phab:T218733|T218733]]
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 08:34 akosiaris@cumin1001: conftool action : set/weight=3; selector: dc=codfw,service=mobileapps,name=scb.*
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 08:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 08:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1119', diff saved to https://phabricator.wikimedia.org/P11980 and previous config saved to /var/cache/conftool/dbconfig/20200721-080842-marostegui.json
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11979 and previous config saved to /var/cache/conftool/dbconfig/20200721-075233-marostegui.json
* 07:49 marostegui: Deploy schema change on db1087, lag will appear on s8 (wikidata) on labsdb hosts [[phab:T256685|T256685]]
* 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 [[phab:T256685|T256685]]', diff saved to https://phabricator.wikimedia.org/P11978 and previous config saved to /var/cache/conftool/dbconfig/20200721-074843-marostegui.json
* 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11977 and previous config saved to /var/cache/conftool/dbconfig/20200721-073757-marostegui.json
* 07:29 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Re-enable writes to es4 [[phab:T257847|T257847]] (duration: 00m 57s)
* 07:22 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1020 from es4 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11976 and previous config saved to /var/cache/conftool/dbconfig/20200721-072251-kormat.json
* 07:21 kormat@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 master [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11975 and previous config saved to /var/cache/conftool/dbconfig/20200721-072127-kormat.json
* 07:13 kormat: killing James_F('s script) on mwmaint1002
* 07:06 _joe_: systemctl reset-failed on deneb, the usual known issue with releng image reporting
* 07:03 kormat@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable writes to es4 [[phab:T257847|T257847]] (duration: 01m 00s)
* 06:59 kormat: Starting es4 failover from es1020 to es1021 [[phab:T257847|T257847]]
* 06:54 kormat@cumin1001: dbctl commit (dc=all): 'Set es1021 to weight 50 [[phab:T257847|T257847]]', diff saved to https://phabricator.wikimedia.org/P11974 and previous config saved to /var/cache/conftool/dbconfig/20200721-065457-kormat.json
* 06:54 marostegui: Pool db1119 into enwiki with MCR schema change done - [[phab:T238966|T238966]]
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1119', diff saved to https://phabricator.wikimedia.org/P11973 and previous config saved to /var/cache/conftool/dbconfig/20200721-065430-marostegui.json
* 06:27 _joe_: systemctl reset-failed on lists1001, a network interface was failing since 1 month
* 06:26 _joe_: enabling notifications for lists1001
* 06:23 _joe_: systemctl reset-failed on both centrallogs
* 02:43 eileen: civicrm revision changed from {{Gerrit|7f1e7d8e38}} to {{Gerrit|cc5d17fbaf}}, config revision is {{Gerrit|23460676f6}}
* 00:02 ryankemper: Began Elasticsearch reindex job on index `dewiki_content` across [`eqiad`, `codfw`, `cloudelastic`], on `rkemper@mwmaint1002` under tmux session `reindex`. Should complete in <24 hours


== 2020-07-20 ==
== 2021-08-02 ==
* 23:49 eileen: tools revision changed from {{Gerrit|b915d8efbd}} to {{Gerrit|22550f38c5}}
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:34 ejegg: updated fundraising CiviCRM from {{Gerrit|8b09c87ce2}} to {{Gerrit|7f1e7d8e38}}
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:12 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/ProofreadPage/ProofreadPage.namespaces.php: {{Gerrit|03ed74f0b9b8f55d01f9112c31f2f6ea17990f9c}}: Add ProofreadPage namespace translation for lij ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:06 Urbanecm: run mwscript namespaceDupes.php --wiki=lijwikisource -- fix ([[phab:T257672|T257672]])
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:05 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2147774caaa0819f8b5d71cc16bc021d94677702}}: Add English aliases for WS-specific namespaces to lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 22:59 ryankemper@deploy1001: Synchronized wmf-config/InitialiseSettings.php: 613669: cirrussearch: Allow 2 dewiki->content shards/node {{!}} https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/613669 (duration: 00m 57s)
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:53 eileen: tools revision changed from {{Gerrit|40d52a0008}} to {{Gerrit|b915d8efbd}}
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:15 sbassett: Revised mitigation deployed for [[phab:T257687|T257687]]
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:07 eileen: tools revision changed from {{Gerrit|711d671600}} to {{Gerrit|40d52a0008}}
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 19:10 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 00m 07s)
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 19:10 mforns@deploy1001: Started deploy [analytics/refinery@af86a05] (thin): Regular analytics weekly train THIN [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 19:09 mforns@deploy1001: Finished deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2] (duration: 05m 46s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 19:03 mforns@deploy1001: Started deploy [analytics/refinery@af86a05]: Regular analytics weekly train [analytics/refinery@af86a05be470ed8283f6585afb5cc231b26944a2]
* 21:16 tzatziki: removing 7 files for legal compliance
* 18:37 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|df2584f181f08da0e1191f97e619e912e587b48d}}: Switch $wgUrlShortenerDomainsWhitelist --> $wgUrlShortenerAllowedDomains ([[phab:T255491|T255491]]) (duration: 00m 57s)
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|dfed4727c6f9e003f9e1949b2995a0cf0ad4f1cc}}: Adding rollbacker group for arzwiki ([[phab:T258100|T258100]]) (duration: 00m 57s)
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee7ac95e16f55e850b318f7354842795e08e0270}}: Change of rollbacker group settings at jawiki ([[phab:T258339|T258339]]) (duration: 00m 57s)
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:36 ejegg: updated payments-wiki settings to point TY page at new URL
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:32 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map (duration: 00m 25s)
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 16:31 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@10afb4b]: airflow: Turn off catchup on cirrus_namespace_map
* 19:00 urbanecm: Morning B&C window completed
* 16:27 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]. Take #2
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 16:27 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 15:59 elukey: restart airflow-webserver/scheduler to pick up TLS to mysql settings
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:21 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 15:17 hnowlan: draining and restarting sessionstore2002
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:17 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:17 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 15:13 jynus: dropping and recreating nagios@localhost users on all m1 servers
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 15:09 hnowlan: draining and restarting sessionstore2001
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 1/2) (duration: 00m 56s)
* 15:09 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:41 urbanecm: Create GrowthExperiments database tables for a bunch of wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|ee47f9d9a867f0bc419928c010579fb4f6fea425}}: Add rollbacker group for kswiki ([[phab:T286789|T286789]]) (duration: 00m 56s)
* 15:09 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:09 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eec997cf88437fc6e2e27a835301aef968c548c4}}: Enable SUL autologin for wikimania.wikimedia.org ([[phab:T285197|T285197]]) (duration: 00m 55s)
* 15:08 moritzm: draining restbase2023 for eventual reboot for kernel security update
* 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/modules/: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 2/2) (duration: 00m 56s)
* 15:00 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/extension.json: {{Gerrit|05cf1d6de1695d2e38531f3fecb26381f4dc0b1d}}: Add a link: Show article extract instead of description in the link inspector ([[phab:T287636|T287636]]; 1/2) (duration: 00m 57s)
* 14:56 moritzm: draining restbase2022 for eventual reboot for kernel security update
* 18:25 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:56 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cc8ca452e66994c211efd684b7ed3810bdc84aaf}}: Add tewikisource as import source for tewikibooks ([[phab:T286978|T286978]]) (duration: 00m 56s)
* 14:54 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|11e96bab3375d604126619169964a2db96808152}}: Add media.defense.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287264|T287264]]) (duration: 00m 56s)
* 14:52 hnowlan: draining and restarting sessionstore1003
* 18:11 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|97b68972108feaf52ab328991f563617f3594d81}}: Remove unused enwiki celebration logos ([[phab:T272108|T272108]]) (duration: 00m 57s)
* 14:52 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: {{Gerrit|16f97941b7d8eacc9bddae7bc570e03b031bead2}}: Remove unused eswiki celebration logos ([[phab:T280908|T280908]]) (duration: 00m 57s)
* 14:52 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 17:57 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 15:44 jynus: remove s2 from db1139 [[phab:T287230|T287230]]
* 14:51 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: NIC maintenance
* 14:49 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 13:21 volans: uploaded spicerack_0.0.57 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 14:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:47 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 13:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: apply 706049
* 14:47 moritzm: draining restbase2021 for eventual reboot for kernel security update
* 13:02 mutante: gerrit1001 - restarting service after 706049
* 14:44 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:00 mutante: gerrit1001 - re-enabling puppet, deploying sshd listening / firewall change
* 14:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 12:38 mutante: gerrit2001 - restarting gerrit after deploying 706049
* 14:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:20 mutante: gerrit servers: disabling puppet
* 14:36 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}} (duration: 03m 50s)
* 12:10 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/MobileFrontend/: [[phab:T287528|T287528]] (duration: 00m 57s)
* 14:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:08 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] (duration: 00m 57s)
* 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime
* 11:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1288.eqiad.wmnet
* 14:34 hnowlan: starting drain and restart of sessionstore hosts for new kernel
* 11:42 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1288.eqiad.wmnet
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1271.eqiad.wmnet
* 14:32 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@ff49fdf]: Update mobileapps to {{Gerrit|0bf7bafa}}
* 11:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287782|T287782]] (duration: 00m 56s)
* 14:26 moritzm: draining restbase2020 for eventual reboot for kernel security update
* 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1271.eqiad.wmnet
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:29 hashar: restarting gerrit primary server on gerrit1001
* 14:23 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 11:27 hashar: restarting Jenkins on contint2001
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:27 hashar: restarting Jenkins on contint1001
* 14:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:14 moritzm: draining restbase2019 for eventual reboot for kernel security update
* 11:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:08 ema: lvs101[34] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:18 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1288.eqiad.wmnet
* 14:07 ema: lvs1016 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:16 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1271.eqiad.wmnet
* 14:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:59 ema: lvs300[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:13 urbanecm: EU B&C window completed
* 13:57 ema: lvs3007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|43020b72e8f466188d738aa73f2023f3017804d0}}: votewiki: Enable Single Transferable Vote ([[phab:T283728|T283728]]) (duration: 00m 57s)
* 13:50 ema: lvs500[12] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 11:08 moritzm: installing openjdk-11 security updates
* 13:48 moritzm: draining restbase2018 for eventual reboot for kernel security update
* 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26bcaafdcd57b1b7a78f9e0ad000325baaf36a72}}: Restore logging for mediamoderation script to better understand high error rate occurring when running script ([[phab:T287511|T287511]]) (duration: 00m 57s)
* 13:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:53 moritzm: catch up bullseye installs with latest state of testing
* 13:47 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 07:24 moritzm: installing libsndfile security updates on buster
* 13:47 ema: lvs5003 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 07:12 moritzm: installing aspell security updates
* 13:44 ema: lvs200[78] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 05:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:42 ema: lvs2010 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 04:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:34 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 02:01 tstarling@deploy1002: Synchronized src/defines.php: for consistency only, should have no production impact (duration: 00m 57s)
* 13:31 ema: lvs400[56] (primaries) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 13:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:27 moritzm: draining restbase2017 for eventual reboot for kernel security update
* 13:24 ema: lvs4007 (secondary) - restart pybal to apply varnish healthcheck changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/610047 [[phab:T255015|T255015]]
* 13:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:16 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:09 moritzm: draining restbase2016 for eventual reboot for kernel security update
* 13:08 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 13:07 moritzm: reset broken ifup systemd states on puppetdb* hosts
* 13:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:59 Urbanecm: creating arywiki ([[phab:T257674|T257674]]), lijwikisource ([[phab:T257672|T257672]]), sysop_itwiki ([[phab:T256545|T256545]]) done
* 12:59 moritzm: draining restbase2015 for eventual reboot for kernel security update
* 12:56 Urbanecm: Create Daimona Eaytoy at sysop_itwiki ([[phab:T256545|T256545]])
* 12:55 urbanecm@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 59s)
* 12:50 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 12:49 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 12:48 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating sysop_itwiki ([[phab:T256545|T256545]])
* 12:46 urbanecm@deploy1001: Synchronized dblists: Creating sysop_itwiki ([[phab:T256545|T256545]]) (duration: 00m 57s)
* 12:46 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:40 moritzm: draining restbase2014 for eventual reboot for kernel security update
* 12:38 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 12:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:34 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 57s)
* 12:32 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating lijwikisource ([[phab:T257672|T257672]])
* 12:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:30 urbanecm@deploy1001: Synchronized dblists: Creating lijwikisource ([[phab:T257672|T257672]]) (duration: 00m 56s)
* 12:28 urbanecm@deploy1001: Synchronized dblists/rtl.dblist: Add arywiki to rtl.dblist ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:27 moritzm: draining restbase2013 for eventual reboot for kernel security update
* 12:27 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 12:21 urbanecm@deploy1001: Synchronized langlist: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 56s)
* 12:19 urbanecm@deploy1001: Synchronized static/images/project-logos/: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:17 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Creating arywiki ([[phab:T257674|T257674]])
* 12:16 urbanecm@deploy1001: Synchronized dblists: Creating arywiki ([[phab:T257674|T257674]]) (duration: 00m 57s)
* 12:02 moritzm: installing qemu security updates on buster
* 11:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|946bf3d239f278b4e099f5dec676f5e2be61d8ca}}: Update brwikimedia logo and add upscaled versions (config) ([[phab:T257925|T257925]]) (duration: 00m 57s)
* 11:49 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 11:49 Urbanecm: Purge 'https://en.wikipedia.org/static/images/project-logos/bnwikimedia.png'
* 11:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f7560b6061dd3a60ccf56c916ebf70a3f104bea7}}: Update brwikimedia logo and add upscaled versions ([[phab:T257925|T257925]]) (duration: 00m 56s)
* 11:44 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|5b97a06fa2e9a06c251a9c1fd2ddd9beec01a683}}: Set $wgUrlShortenerAllowedDomains for all wikis ([[phab:T258134|T258134]]) (duration: 00m 57s)
* 11:42 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 00s)
* 11:36 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|c12f1dee6b9888849c64312c2a4fd65ecbd4091e}}: Remove wgPopupsPageBlacklist config setting ([[phab:T254676|T254676]]) (duration: 00m 57s)
* 11:35 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript createAndPromote.php testwikidatawiki --custom-groups=interface-admin --force 'Lucas Werkmeister (WMDE)'
* 11:34 urbanecm@deploy1001: sync-file aborted: (no justification provided) (duration: 00m 01s)
* 11:25 Urbanecm: mwscript namespaceDupes.php --wiki=kowikiquote  --fix ([[phab:T255031|T255031]])
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3719668511231589b4fc6a723ccdfa772068ad5f}}: Add NamespaceAliases for kowikiquote ([[phab:T255031|T255031]]) (duration: 00m 57s)
* 11:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bc5671a90c65b66989e470fc41225986b2ec9fb5}}: Add media.farsnews.ir to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T253800|T253800]]) (duration: 00m 57s)
* 11:18 Urbanecm: Run mwscript updateCollation.php --wiki=bswiktionary --previous-collation=uppercase in a tmux session at mwmaint1002 ([[phab:T258346|T258346]])
* 11:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0c784784d75c2bbfb570495a6a097d4c44cbe6b3}}: Set $wgCategoryCollation to uca-bs-u-kn on Bosnian Wiktionary ([[phab:T258346|T258346]]) (duration: 00m 58s)
* 11:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6830723b0ad5031e67062ba838f09cd07c2b97a1}}: Convert ukwikisource ns:250 and ns:251 to have subpages ([[phab:T255930|T255930]]) (duration: 00m 57s)
* 11:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|1c7a6215d06aff6cb0a75701292d8147f006d9e4}}: Create closer group at itwikinews ([[phab:T257927|T257927]]) (duration: 00m 57s)
* 10:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:48 moritzm: rebooting releases* hosts for kernel security update
* 10:35 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 56s)
* 10:34 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:614698{{!}} Bumping portals to master (614698)]] (duration: 00m 59s)
* 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1114', diff saved to https://phabricator.wikimedia.org/P11962 and previous config saved to /var/cache/conftool/dbconfig/20200720-103058-marostegui.json
* 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11961 and previous config saved to /var/cache/conftool/dbconfig/20200720-094609-marostegui.json
* 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11960 and previous config saved to /var/cache/conftool/dbconfig/20200720-093154-marostegui.json
* 09:25 godog: update compiler facts
* 09:17 jayme: updating envoyproxy to 1.14.4-1 on all eqiad hosts
* 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1114', diff saved to https://phabricator.wikimedia.org/P11959 and previous config saved to /var/cache/conftool/dbconfig/20200720-091119-marostegui.json
* 09:04 jayme: updating envoyproxy to 1.14.4-1 on all codfw hosts
* 07:54 moritzm: installing libopenmpt security updates
* 07:51 jayme: updating envoyproxy to 1.14.4-1 on all non mw and restbase hosts
* 07:29 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 - [[phab:T255408|T255408]]
* 07:19 marostegui: Drop non used reviewdb database - [[phab:T255715|T255715]]
* 06:55 elukey: restart matomo1002's mariadb to pick up new TLS settings
* 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P11958 and previous config saved to /var/cache/conftool/dbconfig/20200720-065438-marostegui.json
* 06:15 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score/includes/Score.php: reverting Reedy's temporary patch for hardcoding the lilypond version (duration: 00m 57s)
* 06:07 tstarling@deploy1001: Finished scap: fixing missing message from previous sync-dir (duration: 29m 57s)
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11957 and previous config saved to /var/cache/conftool/dbconfig/20200720-055614-marostegui.json
* 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11956 and previous config saved to /var/cache/conftool/dbconfig/20200720-054747-marostegui.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11955 and previous config saved to /var/cache/conftool/dbconfig/20200720-053816-marostegui.json
* 05:37 tstarling@deploy1001: Started scap: fixing missing message from previous sync-dir
* 05:30 tstarling@deploy1001: scap sync-l10n completed (1.35.0-wmf.41) (duration: 02m 44s)
* 05:25 marostegui: Deploy MCR schema change on enwiki on db1119 - [[phab:T238966|T238966]]
* 05:24 tstarling@deploy1001: Synchronized wmf-config/CommonSettings.php: disable lilypond with better error message (duration: 00m 57s)
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082 after a crash [[phab:T258336|T258336]]', diff saved to https://phabricator.wikimedia.org/P11953 and previous config saved to /var/cache/conftool/dbconfig/20200720-051846-marostegui.json
* 05:18 tstarling@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Score: better error message for disabling of Score (duration: 01m 10s)


== 2020-07-19 ==
== 2021-07-31 ==
* 19:16 marostegui: Upgrade and reboot db1085 [[phab:T258360|T258360]]
* 12:40 reedy@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll/: [[phab:T287780|T287780]] [[phab:T287782|T287782]] (duration: 00m 58s)
* 18:57 marostegui: Start mysql on db1082 [[phab:T258336|T258336]]
* 00:01 eileen: civicrm revision changed from {{Gerrit|158ed65e00}} to {{Gerrit|d6baf291f4}}, config revision is {{Gerrit|6011d9c471}}
* 18:51 marostegui: Upgrade and reboot db1082 [[phab:T258336|T258336]]
* 18:45 cdanis@cumin1001: dbctl commit (dc=all): 'db1085 also crashed', diff saved to https://phabricator.wikimedia.org/P11952 and previous config saved to /var/cache/conftool/dbconfig/20200719-184511-cdanis.json
* 18:06 Urbanecm: Run mwscript emptyUserGroup.php --wiki=testwiki contestadmin ([[phab:T256555|T256555]])


== 2020-07-18 ==
== 2021-07-30 ==
* 21:41 shdubsh: restart logstash on logstash200[456]
* 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 21:14 shdubsh: bounce logstash on logstash1007
* 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:10 shdubsh: bounce logstash on logstash1008
* 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 21:06 shdubsh: bounce logstash on logstash1009
* 22:20 eileen: civicrm revision is {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}
* 20:52 marostegui: Due to db1082 crash there will be replication lag on s5 on labsdb hosts - [[phab:T258336|T258336]]
* 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 20:37 cdanis@cumin1001: dbctl commit (dc=all): 'depool db1082, it crashed', diff saved to https://phabricator.wikimedia.org/P11951 and previous config saved to /var/cache/conftool/dbconfig/20200718-203704-cdanis.json
* 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 00:13 dpifke: Performing one-time expiration of ArcLamp files older than 40 days (normal retention is 45 days), to solve disk space issue until either Ganeti issue is solved or compressed logfile support is merged.
* 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
* 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - [[phab:T287760|T287760]] (duration: 00m 57s)
* 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
* 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
* 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
* 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
* 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
* 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
* 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
* 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
* 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
* 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
* 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
* 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
* 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
* 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
* 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
* 13:26 joe: uploaded docker-report 0.0.13 to buster
* 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
* 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
* 11:23 moritzm: installing libsndfile security updates on stretch
* 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
* 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
* 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
* 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
* 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. [[phab:T284592|T284592]]
* 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
* 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
* 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
* 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
* 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
* 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
* 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails [[phab:T286273|T286273]] (duration: 00m 57s)
* 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json


== 2020-07-17 ==
== 2021-07-29 ==
* 21:16 dpifke: Removing MongoDB packages and data from webperf1002.
* 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708870{{!}}Merge new configs with existing testwiki definition]] (duration: 00m 57s)
* 17:39 dpifke@deploy1001: Finished deploy [performance/arc-lamp@a5d2fd3]: (no justification provided) (duration: 00m 05s)
* 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 17:38 dpifke@deploy1001: Started deploy [performance/arc-lamp@a5d2fd3]: (no justification provided)
* 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: [[gerrit:708644{{!}}Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704)]] (duration: 01m 09s)
* 13:53 akosiaris: powercycle kubernetes2002
* 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15  refs [[phab:T281157|T281157]]
* 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1104', diff saved to https://phabricator.wikimedia.org/P11944 and previous config saved to /var/cache/conftool/dbconfig/20200717-122400-marostegui.json
* 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11941 and previous config saved to /var/cache/conftool/dbconfig/20200717-120126-marostegui.json
* 18:37 urbanecm@deploy1002: Finished scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]]) (duration: 17m 06s)
* 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11940 and previous config saved to /var/cache/conftool/dbconfig/20200717-115155-marostegui.json
* 18:19 urbanecm@deploy1002: Started scap: {{Gerrit|796fe8e}}: {{Gerrit|927763c}}: SecurePoll backports ([[phab:T283728|T283728]], [[phab:T284585|T284585]])
* 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1104', diff saved to https://phabricator.wikimedia.org/P11939 and previous config saved to /var/cache/conftool/dbconfig/20200717-113800-marostegui.json
* 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11938 and previous config saved to /var/cache/conftool/dbconfig/20200717-113050-marostegui.json
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: {{Gerrit|9a2383d7ecfe1874c08f38a08d174364a12ad247}}: Display: Use HTML "dir" attribute for ltr/rtl ([[phab:T287649|T287649]]) (duration: 01m 25s)
* 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P11937 and previous config saved to /var/cache/conftool/dbconfig/20200717-112413-marostegui.json
* 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 09:15 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1280.eqiad.wmnet
* 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
* 09:12 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1280.eqiad.wmnet
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:48 moritzm: imported prometheus-atlas-exporter 1.0+git20191204.ffafab7-2 to buster-wikimedia [[phab:T247967|T247967]]
* 15:11 mmandere: pool lvs1013.eqiad.wmnet - [[phab:T286032|T286032]]
* 08:29 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:09 mmandere: pool dns1001.wikimedia.org - [[phab:T286032|T286032]]
* 08:05 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P11936 and previous config saved to /var/cache/conftool/dbconfig/20200717-075124-marostegui.json
* 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1111', diff saved to https://phabricator.wikimedia.org/P11935 and previous config saved to /var/cache/conftool/dbconfig/20200717-074335-marostegui.json
* 14:46 mmandere: depool lvs1013 - [[phab:T286032|T286032]]
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 07:34 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:39 mmandere: depool dns1001 - [[phab:T286032|T286032]]
* 07:33 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 07:32 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 07:30 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 06:30 XioNoX: rename msw1-codfw interface range
* 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
* 06:28 XioNoX: rename msw1-eqiad interface range
* 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - [[phab:T286032|T286032]]
* 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P11934 and previous config saved to /var/cache/conftool/dbconfig/20200717-044748-marostegui.json
* 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 04:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1092', diff saved to https://phabricator.wikimedia.org/P11933 and previous config saved to /var/cache/conftool/dbconfig/20200717-044658-marostegui.json
* 14:11 vgutierrez: restart pybal on lvs2009
* 14:09 vgutierrez: restart pybal on lvs2010
* 14:07 vgutierrez: restart pybal on lvs2008
* 14:05 vgutierrez: restart pybal on lvs2007
* 13:59 vgutierrez: restart pybal on lvs1014
* 13:55 vgutierrez: restart pybal on lvs1015
* 13:52 _joe_: restarting pybal on lvs1016
* 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
* 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
* 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
* 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
* 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
* 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
* 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
* 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
* 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
* 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
* 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
* 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
* 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
* 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 07:52 moritzm: restarting Tomcat on idp-test
* 06:41 XioNoX: push pfw policies - [[phab:T287203|T287203]]
* 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
* 01:08 eileen: civicrm revision changed from {{Gerrit|739c936298}} to {{Gerrit|158ed65e00}}, config revision is {{Gerrit|6011d9c471}}


== 2020-07-16 ==
== 2021-07-28 ==
* 22:15 mutante: testreduce1001 manually git clone 'scandium' branch of integration/visualdiff into /srv/visualdiff ([[phab:T257906|T257906]])
* 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708581{{!}}wgSkipSkins: Update defaults, hide modern (T287616)]] (duration: 01m 06s)
* 21:54 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3 (duration: 01m 49s)
* 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: [[gerrit:708158{{!}}Disable mobile contributions simplifications on Wikidata and Commons (T283988)]] (duration: 01m 58s)
* 21:52 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 3
* 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]] (duration: 01m 06s)
* 21:42 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2 (duration: 01m 33s)
* 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16  refs [[phab:T281157|T281157]]
* 21:41 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 part 2
* 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
* 21:40 crusnov@deploy1001: Finished deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7 (duration: 01m 01s)
* 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
* 21:39 crusnov@deploy1001: Started deploy [netbox/deploy@39c5cae]: Deploying Netbox 2.8.7
* 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
* 21:08 cstone: payments-wiki revision changed from {{Gerrit|91852dbc9b}} to {{Gerrit|bf91f8adff}}
* 18:14 jbond: manually cleared out the puppetdb2002 queue
* 20:32 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable client error logging on Catalan Wikipedia ([[phab:T258073|T258073]]) (duration: 00m 57s)
* 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
* 19:32 sbassett: Deployed mitigations for [[phab:T257687|T257687]]
* 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 19:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T248418|T248418]] TimedMediaHandler: Make videojs the only player on all group0 (duration: 00m 57s)
* 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 18:54 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 18:53 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 18:50 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:58 ryankemper: [[phab:T287112|T287112]] [WDQS] Re-pooled `wdqs2002`
* 18:49 addshore: deployment windows finished with
* 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
* 18:46 addshore@deploy1001: Synchronized wmf-config/extension-list: [[gerrit:611393]] extension-list: Load WikibaseClient via JSON (duration: 00m 56s)
* 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing ([[phab:T279309|T279309]])
* 18:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 2/2 (duration: 00m 56s)
* 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
* 18:35 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613226]] Wikibase: Always set wgWBRepoSettings idGeneratorSeparateDbConnection PT 1/2 (duration: 00m 56s)
* 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
* 18:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613165]] [[phab:T138104|T138104]] Wikibase: stop setting wmgWikibaseTmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
* 18:23 addshore@deploy1001: Synchronized wmf-config/config/incubatorwiki.yaml: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT2/2 (duration: 00m 57s)
* 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
* 18:22 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613199]] [[phab:T256957|T256957]] Move VisualEditor from beta to default on incubatorwiki PT1/2 (duration: 00m 56s)
* 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
* 18:20 addshore@deploy1001: Synchronized wmf-config/config/nlwikimedia.yaml: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT2/2 (duration: 00m 57s)
* 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 18:18 addshore@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [[gerrit:613198]] [[phab:T256142|T256142]] Move VisualEditor from beta to default on nlwikimedia PT1/2 (duration: 00m 56s)
* 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:14 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:613164]] [[phab:T138104|T138104]] Wikibase: stop setting wgWBRepoSettings tmpSerializeEmptyListsAsObjects (duration: 00m 57s)
* 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 18:12 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:613192]] [[phab:T246420|T246420]] Enable limited-width layout for Modern Vector (duration: 00m 56s)
* 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 18:08 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612870]] [[phab:T246977|T246977]] Disable affinity quicksurveys for the following wikis (duration: 00m 57s)
* 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
* 18:03 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime
* 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 17:54 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
* 17:53 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:50 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 17:50 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:49 herron@cumin1001: START - Cookbook sre.ganeti.makevm
* 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:17 XioNoX: msw1-eqiad delete unused VC-ports
* 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 17:05 XioNoX: msw1-codfw - replace member-range with list of individual interfaces
* 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 16:45 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613173{{!}}Re add OtherProjectsSidebarGenerator::buildProjectLinkSidebarFromItemId (T258184)]] (duration: 01m 02s)
* 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
* 16:11 effie: reboot rdb1009 - [[phab:T254990|T254990]]
* 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
* 16:06 effie: Reboot rdb1010 - [[phab:T254990|T254990]]
* 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
* 15:51 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613170{{!}}Revert "Revert "Removes OtherProjectsSidebar hook"" (T258184)]] (duration: 01m 02s)
* 13:29 moritzm: installing python2.7 security updates on stretch
* 15:40 lucaswerkmeister-wmde@deploy1001: scap failed: average error rate on 7/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/e474f13ffac6b8c3bf919c4aeafc8c9b for details)
* 13:08 moritzm: installing python3.5 security updates on stretch
* 15:15 akosiaris: lower codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]. Will open up task for it
* 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:15 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 11:27 moritzm: installing nginx security updates on thumbor*
* 15:07 XioNoX: repool eqsin - [[phab:T257154|T257154]]
* 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:00 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 10:11 moritzm: installing remaining nginx security updates on stretch
* 14:54 XioNoX: load config on cr3-eqsin - [[phab:T257154|T257154]]
* 10:09 godog: temp fix prometheus-icinga-am on alert1001
* 14:54 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/: Backport: [[gerrit:613167{{!}}Avoid trying to register wikibase.Site twice (T258065)]] (duration: 01m 03s)
* 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:43 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mathoid' for release 'production' .
* 09:40 urbanecm: Start server-side upload for 1 video file ([[phab:T287482|T287482]])
* 14:31 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:12 moritzm: rebooting webperf hosts in eqiad for kernel update
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 14:09 XioNoX: upgrade junos on cr3-eqsin - [[phab:T257154|T257154]]
* 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 14:03 jayme: published image docker-registry.discovery.wmnet/envoy:1.14.4-1
* 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:47 XioNoX: remove nonstop-bridging from asw1-eqsin
* 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
* 13:36 XioNoX: power-off cr3-eqsin - [[phab:T257154|T257154]]
* 08:27 Amir1: running several long-running queries against pc1007
* 13:36 akosiaris: increase codfw mobileapps kubernetes traffic to 25% [[phab:T218733|T218733]]
* 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:35 akosiaris@cumin1001: conftool action : set/weight=8; selector: dc=codfw,service=mobileapps,name=scb.*
* 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:30 XioNoX: deactivate BGP groups IX/Transit/PyBal on cr3-eqsin - [[phab:T257154|T257154]]
* 07:53 moritzm: installing aspell security updates on stretch
* 13:27 moritzm: installing an-tool1008
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 13:23 XioNoX: depool eqsin for cr3 replacement - [[phab:T257154|T257154]]
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: [[phab:T287559|T287559]]
* 13:13 volans@deploy1001: Finished deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin (duration: 01m 27s)
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 13:12 volans@deploy1001: Started deploy [homer/deploy@fcf4332]: Force deploy of the homer plugin
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: [[phab:T287559|T287559]]
* 13:04 kormat: restarting tendril to pick up new mariadb config [[phab:T257816|T257816]]
* 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 13:02 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.41
* 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: [[phab:T287559|T287559]]
* 13:02 akosiaris: increase codfw mobileapps kubernetes traffic to 10% [[phab:T218733|T218733]]
* 07:07 godog: remove cloud*/syslog.log from centrallog2001 - [[phab:T287559|T287559]]
* 13:01 akosiaris@cumin1001: conftool action : set/weight=24; selector: dc=codfw,service=mobileapps,name=scb.*
* 07:06 godog: remove node_pinger.prom from node-pinger hosts
* 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092', diff saved to https://phabricator.wikimedia.org/P11926 and previous config saved to /var/cache/conftool/dbconfig/20200716-125643-marostegui.json
* 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
* 12:56 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 04m 32s)
* 02:43 TimStarling: on mwmaint2002 fixing [[phab:T286273|T286273]] broken files using eval.php
* 12:52 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:42 ayounsi@deploy1001: Finished deploy [homer/deploy@fcf4332]: CR607011 (duration: 03m 42s)
* 12:38 ayounsi@deploy1001: Started deploy [homer/deploy@fcf4332]: CR607011
* 12:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:36 akosiaris@cumin1001: conftool action : set/weight=50; selector: dc=codfw,service=mobileapps,name=scb.*
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5% [[phab:T218733|T218733]]
* 12:35 akosiaris: increase codfw mobileapps kubernetes traffic to 5%
* 12:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 12:22 jmm@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:12 jmm@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 12:12 jmm@cumin1001: START - Cookbook sre.ganeti.makevm
* 12:08 jayme: updated envoyproxy to 1.14.4-1 on mw-canary and restbase-canary
* 11:44 XioNoX: remove BGP to AS396253 in eqdfw (peer left the IX)
* 11:26 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258134|T258134]] Fix config variables regex concatenation (duration: 01m 05s)
* 11:23 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[phab:T254315|T254315]] [[gerrit:612670]] Wikibase: remove wmgWikibaseLocalEntitySourceName (duration: 01m 05s)
* 11:18 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T254315|T254315]] [[phab:T257266|T257266]] [[gerrit:609988]] Wikidata client wikis: Define entity sources configuration (take 3) (duration: 01m 08s)
* 10:17 jbond42: upgrade to hiera5
* 10:08 jbond42: disable puppet for hiera5 deployment
* 09:37 jayme: updated envoyproxy to 1.14.4-1 on mw1325.eqiad.wmnet and restbase1026.eqiad.wmnet
* 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:30 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 09:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 moritzm: rebooting flowspec1001
* 08:52 jayme: updated envoyproxy to 1.14.4-1 on mwdebug1001.eqiad.wmnet
* 08:41 moritzm: installing sqlite3 security updates
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2081', diff saved to https://phabricator.wikimedia.org/P11924 and previous config saved to /var/cache/conftool/dbconfig/20200716-083954-marostegui.json
* 08:35 XioNoX: Remove PIM/IGMP related CR stanza (acls) - [[phab:T257573|T257573]]
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 08:26 moritzm: installing dbus security updates
* 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:24 XioNoX: remove igmp-snooping from access switches - [[phab:T257573|T257573]]
* 08:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:15 moritzm: installing python-urllib3 security updates
* 08:15 XioNoX: remove PIM config from eqord/eqdfw/knams routers - [[phab:T257573|T257573]]
* 08:14 XioNoX: remove PIM config from eqiad routers - [[phab:T257573|T257573]]
* 08:11 XioNoX: remove PIM config from esams routers - [[phab:T257573|T257573]]
* 08:09 XioNoX: remove PIM config from eqsin routers - [[phab:T257573|T257573]]
* 08:08 jbond42: update mail delivery for phabricator to use phabricator.discovery.wmnet cname
* 08:07 XioNoX: remove PIM config from codfw routers - [[phab:T257573|T257573]]
* 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P11923 and previous config saved to /var/cache/conftool/dbconfig/20200716-080613-marostegui.json
* 08:03 XioNoX: remove PIM config from ulsfo routers - [[phab:T257573|T257573]]
* 07:41 jayme: imported envoyproxy_1.14.4-1 to stretch-wikimedia
* 07:31 jayme: imported envoyproxy_1.14.4-1 to buster-wikimedia
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1131', diff saved to https://phabricator.wikimedia.org/P11922 and previous config saved to /var/cache/conftool/dbconfig/20200716-072838-marostegui.json
* 07:25 marostegui: Drop database reviewdb-test [[phab:T255715|T255715]]
* 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11921 and previous config saved to /var/cache/conftool/dbconfig/20200716-070331-marostegui.json
* 06:40 XioNoX: remove peering with AS8403 in eqsin (peer left the IX)
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11920 and previous config saved to /var/cache/conftool/dbconfig/20200716-051342-marostegui.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1131', diff saved to https://phabricator.wikimedia.org/P11919 and previous config saved to /var/cache/conftool/dbconfig/20200716-051109-marostegui.json


== 2020-07-15 ==
== 2021-07-27 ==
* 23:54 eileen: tools revision changed from {{Gerrit|7b6018a16e}} to {{Gerrit|711d671600}}
* 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: [[gerrit:708220{{!}}Restore print, links, table and message box styles (T278896)]] (duration: 01m 07s)
* 23:50 eileen: process-control config revision is {{Gerrit|1fc4a9686d}}
* 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708152{{!}}Enable user links on office + test wikis (T287391)]] (duration: 02m 00s)
* 23:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
* 23:04 bd808: tools.admin Removed valhallasw from maintainers ([[phab:T255697|T255697]])
* 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
* 23:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
* 22:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
* 22:52 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99)
* 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
* 22:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
* 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - [[phab:T287210|T287210]] (duration: 02m 28s)
* 22:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:29 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:27 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
* 22:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
* 22:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 18:16 brennen: restarting jenkins for upgrade
* 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 18:00 mutante: DNS - new language 'avk' has been added - This language is called Kotava and is "a proposed international auxiliary language (IAL) that focuses especially on the principle of cultural neutrality". Learn more at https://en.wikipedia.org/wiki/Kotava
* 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 17:32 mutante: puppetmaster - revoking cert for planet.discovery.wmnet, add planet.wikimedia.org, remove planet.svc records, remove specific and outdated hostnames ([[phab:T257840|T257840]])
* 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
* 16:11 moritzm: uploaded jenkins 2.235.2 to thirdparty/ci for stretch/buster [[phab:T257614|T257614]]
* 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
* 15:29 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 15:24 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 15:24 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 15:20 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
* 15:20 moritzm: rebooting webperf* hosts for kernel update
* 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 14:58 addshore@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/Wikibase/repo: [[gerrit:612723]] Stop checking if WikibaseLib is loaded [[phab:T258062|T258062]] (already on mwmaint1002) (duration: 01m 08s)
* 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 14:51 addshore: pulled https://gerrit.wikimedia.org/r/612723 onto mwmaint 1002 ahead of syncing everywhere (and CI finishing)
* 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:37 ema: A:cp: upgrade purged to 0.17 [[phab:T257573|T257573]]
* 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - [[phab:T287238|T287238]]
* 14:30 ema: upload purged 0.17 to buster-wikimedia [[phab:T257573|T257573]]
* 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) [[phab:T147505|T147505]]
* 14:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 04s)
* 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
* 14:26 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Add exceptional wikitech VE/Parsoid config [[phab:T241961|T241961]] (duration: 01m 05s)
* 15:17 mmandere: pool lvs1014.eqiad.wmnet - [[phab:T286061|T286061]]
* 14:25 gehel: repooling wdqs1006 - catched up on lag
* 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
* 14:12 akosiaris: increase codfw mobileapps kubernetes traffic to 2% [[phab:T218733|T218733]]
* 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
* 14:10 akosiaris@cumin1001: conftool action : set/weight=132; selector: dc=codfw,service=mobileapps,name=scb.*
* 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 [[phab:T286061|T286061]]
* 13:58 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/UrlShortener/includes/UrlShortenerUtils.php: [[phab:T258056|T258056]] Add temporary fix to ensure array is passed to array_map() (duration: 01m 08s)
* 15:11 mmandere: pool authdns1001.wikimedia.org - [[phab:T286061|T286061]]
* 13:54 akosiaris: pool kubernetes nodes for mobileapps in codfw
* 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
* 13:53 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - [[phab:T286061|T286061]]
* 13:53 akosiaris@cumin1001: conftool action : set/weight=264; selector: dc=codfw,service=mobileapps,name=scb.*
* 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
* 13:51 akosiaris@cumin1001: conftool action : set/weight=1; selector: dc=codfw,service=mobileapps,name=kubernetes.*
* 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
* 13:04 jforrester@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.41 (duration: 01m 05s)
* 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 13:03 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.41
* 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
* 11:59 addshore: deploy window closed / done :)
* 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
* 11:57 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609987]] Commons: Define entity sources configuration (take 2) [[phab:T254315|T254315]] (duration: 01m 03s)
* 14:53 moritzm: disabling puppet for upcoming row B maintenance
* 11:36 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:612668]] Wikibase test: Client local entity sources are always testwikidata [[phab:T254315|T254315]] (duration: 01m 05s)
* 14:52 mmandere: depool lvs1014 - [[phab:T286061|T286061]]
* 11:27 addshore@deploy1001: Synchronized wmf-config: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT2/2 (duration: 01m 06s)
* 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
* 11:26 addshore@deploy1001: Synchronized dblists/wikidataclient.dblist: [[phab:T254315|T254315]] [[gerrit:612669]] Wikidata test: Split client db lists. PT1/2 (duration: 01m 05s)
* 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
* 11:16 XioNoX: remove as-path prepending in esams
* 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 11:11 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: LABS [[gerrit:612667]] Wikibase labs: All client "local" entity sources are wikidata [[phab:T254315|T254315]] (duration: 01m 04s)
* 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
* 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:612666]] Wikibase: Split localEntitySourceName config for repo and client [[phab:T254315|T254315]] (duration: 01m 16s)
* 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 11:05 XioNoX: re-enable ping offload in esams
* 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
* 11:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 11:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
* 10:56 XioNoX: disable ping offload in esams
* 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 10:55 XioNoX: re-enable ping offload in codfw
* 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 10:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 10:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
* 10:45 XioNoX: disable ping offload in codfw
* 14:40 mmandere: depool authdns1001 - [[phab:T286061|T286061]]
* 10:44 XioNoX: re-enable ping offload in eqiad
* 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
* 10:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 10:41 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
* 10:31 XioNoX: disable ping offload in eqiad
* 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - [[phab:T286061|T286061]]
* 10:31 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'nontls' .
* 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
* 10:30 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - [[phab:T287238|T287238]]
* 10:30 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
* 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11916 and previous config saved to /var/cache/conftool/dbconfig/20200715-102605-marostegui.json
* 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
* 10:20 jayme: updating python3-docker-report to 0.0.5-1 on deneb
* 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11915 and previous config saved to /var/cache/conftool/dbconfig/20200715-100855-marostegui.json
* 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 10:07 jayme: imported docker-report_0.0.5-1 to buster-wikimedia
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:48 marostegui: Deploy schema change on s8 codfw master, lag will appear on codfw [[phab:T256685|T256685]]
* 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11914 and previous config saved to /var/cache/conftool/dbconfig/20200715-094226-marostegui.json
* 14:11 moritzm: installing aspell security updates
* 09:22 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 09:21 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:19 akosiaris: deploy mobileapps in kubernetes to talk HTTPS to the mw API
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 09:10 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 09:07 akosiaris: Correction: deploy eventgate-analytics-external in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:06 akosiaris: deploy eventgate-analytics in staging, eqiad, codfw for switching to using discovery records and HTTPS for talking to the API
* 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:06 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P11913 and previous config saved to /var/cache/conftool/dbconfig/20200715-090545-marostegui.json
* 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
* 09:04 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1120 after reimage', diff saved to https://phabricator.wikimedia.org/P11912 and previous config saved to /var/cache/conftool/dbconfig/20200715-085032-marostegui.json
* 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
* 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 08:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 08:19 moritzm: piwik.wikimedia.org switched to CAS authentication
* 13:30 ottomata: deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
* 08:19 elukey: move piwik.wikimedia.org to CAS (idp.wikimedia.org)
* 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
* 07:29 XioNoX: delete deprecated AS3209 AMS-IX router
* 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
* 06:59 dcausse: depooling wdqs1006 (high lag)
* 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
* 06:09 marostegui: Stop replication on db1120 to avoid having 10.4 -> 10.1 replication for long [[phab:T254871|T254871]]
* 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
* 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 for reimage [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11911 and previous config saved to /var/cache/conftool/dbconfig/20200715-060649-marostegui.json
* 11:23 Lucas_WMDE: EU backport+config window done
* 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1103 to x1 master [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11910 and previous config saved to /var/cache/conftool/dbconfig/20200715-060145-marostegui.json
* 11:20 oblivian@deploy1002: Synchronized debug.json: Config: [[gerrit:708255{{!}}Add the experimental kubernetes backend to mwdebug (T283056)]] (duration: 00m 56s)
* 06:00 marostegui: Starting x1 failover from db1120 to db1103 - [[phab:T254871|T254871]]
* 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704456{{!}}Add stream configuration for ContentTranslation events (T281982)]] (duration: 00m 58s)
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 ', diff saved to https://phabricator.wikimedia.org/P11909 and previous config saved to /var/cache/conftool/dbconfig/20200715-052939-marostegui.json
* 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
* 04:46 marostegui: Start x1 pre failover steps [[phab:T254871|T254871]]
* 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1103 weight to 0 before the switchover [[phab:T254871|T254871]]', diff saved to https://phabricator.wikimedia.org/P11908 and previous config saved to /var/cache/conftool/dbconfig/20200715-044432-marostegui.json
* 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
* 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1135', diff saved to https://phabricator.wikimedia.org/P11907 and previous config saved to /var/cache/conftool/dbconfig/20200715-044332-marostegui.json
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
* 01:45 eileen: tools revision changed from {{Gerrit|a9e7dc1559}} to {{Gerrit|7b6018a16e}}
* 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
* 00:26 ryankemper@deploy1001: Finished deploy [wdqs/wdqs@8f6f660]: 0.3.41 (duration: 15m 10s)
* 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
* 00:11 ryankemper@deploy1001: Started deploy [wdqs/wdqs@8f6f660]: 0.3.41
* 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
* 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
* 09:52 jynus: reverting query killer parameters on s3 codfw replicas
* 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
* 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
* 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
* 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
* 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
* 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
* 08:57 _joe_: repooling mw225[12] for apis
* 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
* 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
* 08:36 jynus: reenabled puppet on mwmaint1002
* 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
* 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
* 07:52 jynus: disabling puppet on mwmaint1002
* 07:14 moritzm: installing krb security updates on buster
* 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - [[phab:T287238|T287238]]
* 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part II (duration: 00m 56s)
* 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708204{{!}}Enable request language for RDF stubs in testwikidatawiki (T285795)]], Part I (duration: 00m 57s)
* 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 [[phab:T287230|T287230]]', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json


== 2020-07-14 ==
== 2021-07-26 ==
* 19:52 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: [[phab:T252448|T252448]] [[phab:T255190|T255190]] Bump Parsoid to v0.12.0-a23 (duration: 01m 06s)
* 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
* 18:13 ryankemper: all long-running elasticsearch reindex jobs are complete
* 18:30 cstone: SmashPig revision changed from {{Gerrit|be272c02ce}} to {{Gerrit|020d4eccd4}},
* 18:09 jforrester@deploy1001: Synchronized dblists/: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Remove the mobilemainpagelegacy dblist (duration: 01m 04s)
* 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - [[phab:T287394|T287394]]
* 18:07 jforrester@deploy1001: Synchronized multiversion/MWConfigCacheGenerator.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop loading the mobilemainpagelegacy dblist (duration: 01m 05s)
* 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
* 18:05 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T32405|T32405]] [[phab:T254287|T254287]] Stop varying wgMFSpecialCaseMainPage (duration: 01m 05s)
* 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 15:56 elukey: upgrade spark2 on stat100x to 2.4.4-bin-hadoop2.6-3
* 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
* 15:40 hnowlan@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. [[phab:T287394|T287394]]
* 15:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
* 15:23 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # [[phab:T287122|T287122]]
* 15:17 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part II (duration: 01m 49s)
* 15:13 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: [[gerrit:707021{{!}}Don’t generate current content text twice]], Part I (duration: 01m 50s)
* 15:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
* 15:10 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:02 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
* 14:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/skins/Vector/includes/SkinVector.php: [[phab:T257914|T257914]] Restore div wrapper around print footer (duration: 01m 03s)
* 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 14:53 hnowlan@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' .
* 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
* 14:50 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
* 14:48 jforrester@deploy1001: Synchronized php-1.35.0-wmf.41/extensions/WikibaseMediaInfo/src/WikibaseMediaInfoHooks.php: Fix case of directory name (duration: 01m 05s)
* 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
* 14:48 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
* 14:48 moritzm: rebooting apt1001 for kernel update
* 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
* 14:42 jynus: stopping db1117:3322 (m2) replication temp. for otrs db cloning [[phab:T257928|T257928]]
* 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
* 14:40 hnowlan@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' .
* 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:31 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:15 XioNoX: rollback sampling for [[phab:T286038|T286038]]
* 14:26 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 14:23 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 14:21 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
* 14:20 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:18 _joe_: docker-image prune on deneb [[phab:T287222|T287222]]
* 14:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 07:17 _joe_: manage-production-images prune on deneb, [[phab:T287222|T287222]]
* 14:18 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
* 14:14 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:39 moritzm: installing krb5 security updates
* 14:13 andrewbogott: upgrading wikitech-static to mw 1.34.2
* 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki
* 14:11 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:01 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:42 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P11900 and previous config saved to /var/cache/conftool/dbconfig/20200714-132823-marostegui.json
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11899 and previous config saved to /var/cache/conftool/dbconfig/20200714-132742-marostegui.json
* 13:27 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1001.wikimedia.org
* 13:24 jbond42: reboot dns1001
* 13:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:23 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:22 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1001.wikimedia.org
* 13:22 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns1002.wikimedia.org
* 13:18 jbond42: reboot dns1002
* 13:18 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:18 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:18 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns1002.wikimedia.org
* 13:16 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org
* 13:13 jbond42: reboot dns2002
* 13:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:13 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:13 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2002.wikimedia.org
* 13:13 jbond@cumin1001: conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org
* 13:10 jbond42: reboot dns2001
* 13:10 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:10 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 13:10 jbond@cumin1001: conftool action : set/pooled=no; selector: name=dns2001.wikimedia.org
* 13:09 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 13:06 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 13:01 jbond42: rebooting dns3002
* 13:01 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:01 jbond@cumin1001: START - Cookbook sre.hosts.downtime
* 12:58 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 12:57 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps after fixing [[phab:T257887|T257887]] (duration: 01m 02s)
* 12:31 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 12:24 jbond42: route ns0.wikimedia.org to codfw for reboot
* 12:20 moritzm: installing xen security updates (client-side tools/libs)
* 12:19 jbond42: re-enable puppet fleet
* 12:07 jbond42: disable puppet fleet wide to reboot puppetdb's
* 12:07 jbond42: disable puppet ro reboot puppetdb's
* 12:01 jforrester@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.41
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for query plan checks [[phab:T238966|T238966]] ', diff saved to https://phabricator.wikimedia.org/P11898 and previous config saved to /var/cache/conftool/dbconfig/20200714-113612-marostegui.json
* 11:35 _joe_: restart pybal on lvs2009 [[phab:T257887|T257887]]
* 11:31 _joe_: restart pybal on lvs2010 [[phab:T257887|T257887]]
* 11:25 _joe_: restart pybal on lvs1015 [[phab:T257887|T257887]]
* 11:22 _joe_: restart pybal on lvs1016
* 11:15 jayme@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 11:03 jayme@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 10:59 jayme@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
* 10:56 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=wtp2005.codfw.wmnet
* 10:52 volans: powerdown wtp2005, hardware issue - [[phab:T257903|T257903]]
* 10:47 volans@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid-php
* 10:45 jiji@cumin1001: conftool action : set/pooled=no; selector: name=wtp2005.codfw.wmnet,service=parsoid
* 10:45 effie: depool wtp2005
* 10:42 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:42 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 10:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 10:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 10:32 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 10:18 oblivian@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:14 James_F: Running AbuseFilter's updateVarDumps for group1 [[phab:T246539|T246539]]
* 10:13 oblivian@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'termbox' for release 'production' .
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
* 10:10 oblivian@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
* 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P11897 and previous config saved to /var/cache/conftool/dbconfig/20200714-094449-marostegui.json
* 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075', diff saved to https://phabricator.wikimedia.org/P11896 and previous config saved to /var/cache/conftool/dbconfig/20200714-094354-marostegui.json
* 09:39 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: ExtensionDistributor: Add REL1_35 as a candidate release (duration: 01m 06s)
* 09:05 jforrester@deploy1001: Finished scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 51m 41s)
* 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 for PDU upgrade [[phab:T257871|T257871]]', diff saved to https://phabricator.wikimedia.org/P11895 and previous config saved to /var/cache/conftool/dbconfig/20200714-084033-marostegui.json
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:30 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 08:13 jforrester@deploy1001: Started scap: Re-re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 08:05 akosiaris: restart pybal on lvs2009
* 08:03 _joe_: restart pybal on lvs1016
* 08:02 akosiaris: restart pybal on lvs2007
* 08:01 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=restbase2009.codfw.wmnet
* 08:00 _joe_: restart pybal on lvs1015
* 08:00 akosiaris: restart pybal on lvs2010 after merging https://gerrit.wikimedia.org/r/612487
* 07:52 jforrester@deploy1001: sync aborted: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]] (duration: 02m 14s)
* 07:50 jforrester@deploy1001: Started scap: Re-start full scap to push out wmf.41 and switch testwikis to it [[phab:T256669|T256669]]
* 07:48 oblivian@deploy1001: Synchronized wmf-config/InitialiseSettings.php: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 01m 06s)
* 07:32 oblivian@deploy1001: sync-file aborted: revert forcehttps in an attempt to fix [[phab:T257887|T257887]] (duration: 00m 20s)
* 07:31 oblivian@deploy1001: Scap failed!: 7/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 07:27 moritzm: installing libtasn1-6 security updates
* 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075', diff saved to https://phabricator.wikimedia.org/P11894 and previous config saved to /var/cache/conftool/dbconfig/20200714-071233-marostegui.json
* 07:04 marostegui: Drop gerrit, gerritro, gerrittest users from m2 databases - [[phab:T255715|T255715]]
* 06:58 marostegui: Stop mysql on db1131 for HW maintenance
* 06:56 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:54 jforrester@deploy1001: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org) (duration: 24m 59s)
* 06:54 jforrester@deploy1001: Scap failed!: 9/9 canaries failed their endpoint checks(http://en.wikipedia.org)
* 06:53 oblivian@deploy2001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
* 06:53 marostegui: Deploy MCR schema change on s5 primary master [[phab:T238966|T238966]]
* 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1078', diff saved to https://phabricator.wikimedia.org/P11893 and previous config saved to /var/cache/conftool/dbconfig/20200714-065229-marostegui.json
* 06:29 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.41
* 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease a bit db1088 load', diff saved to https://phabricator.wikimedia.org/P11891 and previous config saved to /var/cache/conftool/dbconfig/20200714-051551-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for HW maintenance', diff saved to https://phabricator.wikimedia.org/P11890 and previous config saved to /var/cache/conftool/dbconfig/20200714-050931-marostegui.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 from api', diff saved to https://phabricator.wikimedia.org/P11889 and previous config saved to /var/cache/conftool/dbconfig/20200714-050912-marostegui.json
* 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1093 to s6 master and remove read-only from s6 [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11888 and previous config saved to /var/cache/conftool/dbconfig/20200714-050157-marostegui.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s6 as read-only for maintenance [[phab:T257253|T257253]]', diff saved to https://phabricator.wikimedia.org/P11887 and previous config saved to /var/cache/conftool/dbconfig/20200714-050039-marostegui.json
* 05:00 marostegui: Starting s6 failover from db1131 to db1093 - [[phab:T257253|T257253]]
* 04:59 James_F: 1.35.0-wmf.41 branched at {{Gerrit|7d04152db4f8ea9a459511bed8117101d9bb4602}}
* 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P11886 and previous config saved to /var/cache/conftool/dbconfig/20200714-043907-marostegui.json
* 04:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 in preparation for failover', diff saved to https://phabricator.wikimedia.org/P11885 and previous config saved to /var/cache/conftool/dbconfig/20200714-041548-marostegui.json
* 04:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11884 and previous config saved to /var/cache/conftool/dbconfig/20200714-041440-marostegui.json
* 01:23 ryankemper: Started long-running Elasticsearch reindex of `eqiad`, `codfw`, and `cloudelastic`. tmux session `reindex` under `ryankemper` on `mwmaint1002`
* 01:20 cdanis: ❌cdanis@lvs1015.eqiad.wmnet ~ 🕤🍺 sudo systemctl restart pybal.service
* 01:15 cdanis: ✔️ cdanis@lvs1016.eqiad.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:14 cdanis: ✔️ cdanis@lvs2009.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service
* 01:01 cdanis: ✔️ cdanis@lvs2010.codfw.wmnet ~ 🕘🍺 sudo systemctl restart pybal.service


== 2020-07-13 ==
== 2021-07-24 ==
* 23:06 mutante: releases* delete /usr/local/sbin/sync-* scripts created by rsync::quickdatacopy and let puppet recreate the ones still needed
* 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see [[Phab:T280392]] and [[Phab:T280397]]' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # [[phab:T287321|T287321]]
* 22:27 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I80ca62643f5c}} (duration: 00m 58s)
* 20:12 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding (duration: 00m 29s)
* 20:12 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@1edde21]: airflow: ship_to_es: Implement multi-index understanding
* 20:03 mutante: rsynced reprepro data from releases1001 to releases1002, releases2002
* 19:50 eileen: disable target smart job process-control config revision is {{Gerrit|b00e7680ca}}
* 19:48 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1] (duration: 00m 07s)
* 19:47 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1] (thin): Regular analytics weekly train THIN [analytics/refinery@de0a1f1]
* 19:47 milimetric@deploy1001: Finished deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1] (duration: 06m 41s)
* 19:41 milimetric@deploy1001: Started deploy [analytics/refinery@de0a1f1]: Regular analytics weekly train [analytics/refinery@de0a1f1]
* 19:39 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:33 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I1a12124f1811e9a}} (duration: 00m 57s)
* 18:53 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T248343|T248343]] Don't use the 'zeroconf' configuration for VisualEditor (duration: 00m 55s)
* 18:43 dcausse: BACON done
* 18:40 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T257745|T257745]]: Add rollbacker to elwiki (duration: 00m 56s)
* 18:26 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T250810|T250810]]: Set proper language code for some wikis (duration: 00m 56s)
* 18:18 dcausse@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[phab:T256928|T256928]]: Scale largest shards to be closer to 30GB (duration: 00m 56s)
* 16:17 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:17 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 15:56 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:610265{{!}}Load WikibaseClient using extension registration in beta (T257435)]] (duration: 00m 55s)
* 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11882 and previous config saved to /var/cache/conftool/dbconfig/20200713-155240-marostegui.json
* 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11881 and previous config saved to /var/cache/conftool/dbconfig/20200713-154847-marostegui.json
* 15:39 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:35 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 15:30 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:50 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting DiscussionToolsEnableVisual, default value (duration: 00m 57s)
* 14:17 moritzm: removing lilypond from production [[phab:T257066|T257066]]
* 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11880 and previous config saved to /var/cache/conftool/dbconfig/20200713-133604-marostegui.json
* 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11879 and previous config saved to /var/cache/conftool/dbconfig/20200713-133535-marostegui.json
* 13:05 kormat@cumin1001: dbctl commit (dc=all): 'Fully repool es1022, and set es1020 to zero weight [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11878 and previous config saved to /var/cache/conftool/dbconfig/20200713-130532-kormat.json
* 12:08 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1022 after reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11873 and previous config saved to /var/cache/conftool/dbconfig/20200713-120818-kormat.json
* 11:49 Urbanecm: Password reset for User:Alert5 ([[phab:T257806|T257806]])
* 11:44 akosiaris: repool ganeti1007 [[phab:T244530|T244530]]. Start emptying ganeti1008
* 11:08 Urbanecm: EU B&C done
* 11:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|896c042296b4e1f5d88f786981537655e5d9fea9}}: Enable SandboxLink extension in trwiki ([[phab:T256782|T256782]]) (duration: 00m 56s)
* 10:44 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: [[gerrit:612175{{!}} Bumping portals to master (612175)]] (duration: 00m 56s)
* 10:43 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:612175{{!}} Bumping portals to master (612175)]] (duration: 00m 56s)
* 09:42 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 08:58 ema: cp: rolling ats-backend-restart to apply SyslogIdentifier changes -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/611311
* 08:57 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: [[phab:T248343|T248343]] Explicitly set visualeditor-enable to 0 when non-default (duration: 00m 57s)
* 08:44 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1022 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11871 and previous config saved to /var/cache/conftool/dbconfig/20200713-084449-kormat.json
* 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1093', diff saved to https://phabricator.wikimedia.org/P11870 and previous config saved to /var/cache/conftool/dbconfig/20200713-083902-marostegui.json
* 08:34 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1022 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11869 and previous config saved to /var/cache/conftool/dbconfig/20200713-083414-kormat.json
* 08:20 kormat: reimaging es1022 [[phab:T257284|T257284]]
* 06:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:53 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:52 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
* 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:51 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 06:16 marostegui: Reverse gerrit password on m2 master - [[phab:T255715|T255715]]
* 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11868 and previous config saved to /var/cache/conftool/dbconfig/20200713-060410-marostegui.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1093', diff saved to https://phabricator.wikimedia.org/P11867 and previous config saved to /var/cache/conftool/dbconfig/20200713-055422-marostegui.json
* 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1093 for upgrade', diff saved to https://phabricator.wikimedia.org/P11866 and previous config saved to /var/cache/conftool/dbconfig/20200713-054840-marostegui.json
* 05:34 marostegui: Deploy schema change on s3 codfw master, lag will appear on codfw [[phab:T253276|T253276]]
* 05:30 marostegui: Stop replication on db1082 for schema change and triggers removal [[phab:T238966|T238966]]
* 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11865 and previous config saved to /var/cache/conftool/dbconfig/20200713-052928-marostegui.json
* 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 for innodb compression', diff saved to https://phabricator.wikimedia.org/P11864 and previous config saved to /var/cache/conftool/dbconfig/20200713-051428-marostegui.json


== 2020-07-11 ==
== 2021-07-23 ==
* 19:16 qchris: Restarting Gerrit on gerrit1001 to switch to new gerrit.war and zuul plugin
* 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - [[phab:T287110|T287110]]
* 19:16 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001 (duration: 00m 07s)
* 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw [[phab:T287110|T287110]]
* 19:15 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit1001
* 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 19:08 qchris: Restarting Gerrit on gerrit2001 to switch to new gerrit.war and zuul plugin
* 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 18:55 qchris@deploy1001: Finished deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001 (duration: 00m 10s)
* 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
* 18:55 qchris@deploy1001: Started deploy [gerrit/gerrit@a71a0df]: Gerrit to v3.2.2-138-g230805407f and zuul plugin to master-12-ge51d7e8 on gerrit2001
* 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:15 effie: enable puppet on mc-gp* hosts
* 15:47 papaul: powerdown wdqs2002 for IDRAC reset
* 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - [[phab:T287238|T287238]]
* 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging [[phab:T285384|T285384]]
* 14:16 brennen: gitlab1001: running ansible to deploy [[gerrit:707236{{!}}fix puma exporter listen address]] ([[phab:T275170|T275170]])
* 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]] (duration: 03m 32s)
* 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - [[phab:T271232|T271232]]
* 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
* 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - [[phab:T287244|T287244]]
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
* 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
* 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
* 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
* 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
* 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
* 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
* 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
* 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
* 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
* 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
* 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
* 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
* 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
* 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
* 03:11 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
* 03:09 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic1*` (eqiad)
* 03:06 ryankemper: [[phab:T287223|T287223]] Installed `nginx-light` on all of `elastic2*` (codfw)
* 02:53 ejegg: updated Fundraising CiviCRM from {{Gerrit|819c11307d}} to {{Gerrit|739c936298}}
* 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
* 01:28 ejegg: updated payments-wiki from {{Gerrit|844b59ee42}} to {{Gerrit|cc5d14ea7f}}
* 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # [[phab:T287222|T287222]]


== 2020-07-10 ==
== 2021-07-22 ==
* 21:52 ryankemper: Started long-running reindex of Elasticsearch indices in `eqiad`, `codfw`, and `dewiki` on `mwmaint1002` under tmux session `reindex` for user `ryankemper`
* 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: [[gerrit:706003{{!}}Make sure enable responsive mode UI reflects actual preference value (T285402)]] (duration: 00m 56s)
* 20:26 jgleeson: updated fundraising-tools from {{Gerrit|08ba1f6177}} to {{Gerrit|f8e424fe32}}
* 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - [[phab:T282855|T282855]] [[phab:T238138|T238138]] [[phab:T282562|T282562]] [[phab:T271168|T271168]] (duration: 00m 55s)
* 19:02 mutante: removing firewall hole for gerrit -> mysql servers on dbproxy servers for misc db's
* 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 18:44 mutante: kubernetes1004 - started nagios-nrpe-server
* 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
* 17:57 ebernhardson: change loginwiki password for Cindy-the-browser-test-bot, no email account was associated to allow for normal reset.
* 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
* 17:05 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I63fcea7737}} (duration: 00m 57s)
* 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 16:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
* 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN) (duration: 00m 08s)
* 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
* 15:56 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145] (thin): Update EventLogging refine whitelist (THIN)
* 19:00 urbanecm: Start server-side upload for 1 video file ([[phab:T287061|T287061]])
* 15:44 milimetric@deploy1001: Finished deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist (duration: 15m 17s)
* 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]] (duration: 03m 22s)
* 15:30 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 18:58 urbanecm: Start server-side upload for 1 video file ([[phab:T286489|T286489]])
* 15:29 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 18:56 urbanecm: Start server-side upload for 1 video file ([[phab:T286665|T286665]])
* 15:29 milimetric@deploy1001: Started deploy [analytics/refinery@4d40145]: Update EventLogging refine whitelist
* 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - [[phab:T271232|T271232]]
* 15:19 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # [[phab:T286500|T286500]]
* 15:03 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26c23dee57cc105c6ff98f4403618cfab536e089}}: hewikisource: Add namespace aliases ([[phab:T286500|T286500]]) (duration: 00m 55s)
* 14:39 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
* 14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|599c2209c332fb0ebf3079bfb44558eb67ae5657}}: enwikisource: Create upload-shared user group ([[phab:T285130|T285130]]) (duration: 00m 56s)
* 14:30 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 13:41 godog: bounce ms-be1037, not quite responsive
* 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]] (duration: 03m 18s)
* 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11860 and previous config saved to /var/cache/conftool/dbconfig/20200710-123604-marostegui.json
* 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 12:20 reedy@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Score/: Make Score errors use a specific css class (duration: 00m 58s)
* 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - [[phab:T271232|T271232]]
* 10:21 kormat@cumin1001: dbctl commit (dc=all): 'Finish repooling es1021, and remove weight from es1010 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11859 and previous config saved to /var/cache/conftool/dbconfig/20200710-102147-kormat.json
* 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|6a909301d93045ad6752ded08fa5ed7c2972f855}}: Enable the visual editor on the 2021 namespace on Wikimania wiki ([[phab:T287197|T287197]]) (duration: 00m 55s)
* 09:49 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling es1021 after reimage @ 50% [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11858 and previous config saved to /var/cache/conftool/dbconfig/20200710-094954-kormat.json
* 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime
* 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|f765832fa2bcfb9e43516e4962254854c3a3b39a}}: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T287204|T287204]]) (duration: 00m 55s)
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11857 and previous config saved to /var/cache/conftool/dbconfig/20200710-085157-marostegui.json
* 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
* 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106', diff saved to https://phabricator.wikimedia.org/P11856 and previous config saved to /var/cache/conftool/dbconfig/20200710-085112-marostegui.json
* 18:10 legoktm: testing dc switchover warmup script in eqiad
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1107', diff saved to https://phabricator.wikimedia.org/P11855 and previous config saved to /var/cache/conftool/dbconfig/20200710-085040-marostegui.json
* 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P11853 and previous config saved to /var/cache/conftool/dbconfig/20200710-082346-marostegui.json
* 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11852 and previous config saved to /var/cache/conftool/dbconfig/20200710-082329-marostegui.json
* 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 08:22 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 08:22 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11851 and previous config saved to /var/cache/conftool/dbconfig/20200710-080912-marostegui.json
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1119', diff saved to https://phabricator.wikimedia.org/P11850 and previous config saved to /var/cache/conftool/dbconfig/20200710-080854-marostegui.json
* 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es1021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11849 and previous config saved to /var/cache/conftool/dbconfig/20200710-080843-kormat.json
* 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 08:01 kormat@cumin1001: dbctl commit (dc=all): 'Reset es2020/es2021 to correct weights after master switch [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11848 and previous config saved to /var/cache/conftool/dbconfig/20200710-080133-kormat.json
* 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 08:00 moritzm: installing cron security updates on jessie (stretch/buster already fixed)
* 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
* 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P11847 and previous config saved to /var/cache/conftool/dbconfig/20200710-075608-marostegui.json
* 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
* 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1107', diff saved to https://phabricator.wikimedia.org/P11846 and previous config saved to /var/cache/conftool/dbconfig/20200710-075500-marostegui.json
* 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1079', diff saved to https://phabricator.wikimedia.org/P11845 and previous config saved to /var/cache/conftool/dbconfig/20200710-075431-marostegui.json
* 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
* 07:44 kormat: reimaging es1021 to buster [[phab:T257284|T257284]]
* 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 07:43 kormat@cumin1001: dbctl commit (dc=all): 'Add weight to es1020, reduce weight on es1021 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11844 and previous config saved to /var/cache/conftool/dbconfig/20200710-074326-kormat.json
* 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
* 07:41 jbond@deploy1001: Finished deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors (duration: 00m 05s)
* 16:56 brennen: gitlab1001: running ansible to deploy [[gerrit:706396]] ([[phab:T275170|T275170]])
* 07:41 jbond@deploy1001: Started deploy [librenms/librenms@0a88d64]: redeplopy to [try and] fix php errors
* 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
* 07:32 moritzm: installing e2fsprogs security updates on jessie (stretch/buster already fixed)
* 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
* 07:15 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 07:14 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
* 07:13 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
* 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11843 and previous config saved to /var/cache/conftool/dbconfig/20200710-065751-marostegui.json
* 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
* 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P11841 and previous config saved to /var/cache/conftool/dbconfig/20200710-063818-marostegui.json
* 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1134', diff saved to https://phabricator.wikimedia.org/P11840 and previous config saved to /var/cache/conftool/dbconfig/20200710-063746-marostegui.json
* 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
* 06:35 marostegui: Compress InnoDB on db1124:3311 (Sanitarium - lag will appear on s1 on labsdb) - [[phab:T254462|T254462]]
* 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
* 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P11839 and previous config saved to /var/cache/conftool/dbconfig/20200710-044428-marostegui.json
* 15:45 marostegui: Stop db2091 for onsite maintenance
* 01:44 mutante: LDAP - adding coka to wmde and nda ([[phab:T257038|T257038]])
* 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
* 00:47 Reedy: truncated labswiki.interwiki table (outdated and unnecessary)
* 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
* 15:14 mmandere: pool lvs1015 - [[phab:T286065|T286065]]
* 15:14 jynus: shutdown db2097 for hw servicing [[phab:T287072|T287072]]
* 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
* 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
* 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
* 14:47 mmandere: depool lvs1015 - [[phab:T286065|T286065]]
* 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
* 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - [[phab:T286065|T286065]]
* 14:29 effie: restarting pybal in lvs2009 and lvs1015
* 14:27 moritzm: installing libwebp security updates on stretch
* 14:25 effie: restarting pybal in lvs2010 and lvs1016
* 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0208fc2b71863c91c3e767373d4bea1a2eaf178d}}: Growth: Add mentor dashboard related config ([[phab:T278920|T278920]]) (duration: 00m 55s)
* 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
* 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
* 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
* 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
* 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
* 11:36 Lucas_WMDE: EU backport+config window done
* 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (2/2) (duration: 01m 04s)
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: [[gerrit:705690{{!}}Avoid using MWHttpRequest::factory()]] (1/2) (duration: 01m 04s)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: [[gerrit:705682{{!}}Avoid using WikiPage::factory()]] (duration: 01m 06s)
* 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
* 10:45 effie: restart pybal on lvs2009 and lvs1015
* 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
* 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:42 effie: restart pybal on lvs2010  and lvs1016
* 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
* 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
* 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
* 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
* 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - [[phab:T287110|T287110]]
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
* 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
* 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - [[phab:T287110|T287110]]
* 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
* 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
* 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
* 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
* 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
* 05:31 ryankemper: [[phab:T281327|T281327]] [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
* 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
* 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE


== 2020-07-09 ==
== 2021-07-21 ==
* 23:10 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2c2dea832}} (duration: 00m 56s)
* 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis ([[phab:T257066|T257066]]) (duration: 01m 03s)
* 21:52 tgr: all sessions have been invalidated due to [[phab:T256395|T256395]]
* 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 20:58 eileen: https://phabricator.wikimedia.org/T253152
* 22:41 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 19:16 herron: upgraded eqiad elk7 cluster from 7.4.2 to 7.8.0 [[phab:T234854|T234854]]
* 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 19:05 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 18:51 elukey: update spark2 to 2.4.4-bin-hadoop2.6-3 for buster-wikimedia
* 22:37 ryankemper: [[phab:T280382|T280382]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
* 18:44 mutante: stat1004, stat1006, stat1007 - upgrading git-review package from 1.25 to 1.27 so that it keeps working with new Gerrit 3.2 ([[phab:T257609|T257609]])
* 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 18:10 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|9f2557f848e99facaa62ca6b3a948cc3e32c32a3}}: Updating config for Readers Web affinity quicksurvey ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
* 17:42 chaomodus: codfw frack management dns automation deployment complete [[phab:T233183|T233183]]
* 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 17:37 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:36 James_F: Synchronized wmf-config/CommonSettings.php: ExtensionDistribution: Drop REL1_33, EOL'ed [[phab:T256087|T256087]]
* 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:35 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 20:27 dancy: testing upcoming Scap release on beta
* 17:35 moritzm: rebooting moscovium for kernel update
* 18:27 ryankemper: [[phab:T281327|T281327]] [Elastic] `sudo -i wmf-auto-reimage-host -p [[phab:T281327|T281327]] elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
* 17:33 chaomodus: deploying frack codfw management dns automation
* 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
* 17:32 crusnov@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
* 17:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
* 17:28 crusnov@cumin2001: START - Cookbook sre.dns.netbox
* 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:27 moritzm: rebooting planet1002 (planet.wikimedia.org) for kernel update
* 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|1453831db13e17e550a86dd99d09dc26eeb242b1}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 04s)
* 17:27 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: {{Gerrit|aca510b773a67d24452731d5d6a33952c57592b8}}: Do not teardown newtopictool interface if it was not setup ([[phab:T287035|T287035]]) (duration: 01m 05s)
* 17:10 krinkle@deploy1001: Synchronized wmf-config/: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 04s)
* 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
* 17:09 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|Ia2f5eddbf2aad2}} (duration: 01m 05s)
* 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 15:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
* 15:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # [[phab:T285811|T285811]]
* 14:29 papaul: replacing msw-b1,b2,b3 and b4
* 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: [[gerrit:705912{{!}}Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085)]] (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
* 14:03 moritzm: installing libtirpc security updates
* 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
* 13:45 moritzm: installing gnutls28 security updates
* 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
* 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1089', diff saved to https://phabricator.wikimedia.org/P11831 and previous config saved to /var/cache/conftool/dbconfig/20200709-133134-marostegui.json
* 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 13:31 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
* 13:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
* 13:29 moritzm: rebooting puppetboard1001 (puppetboard.wikimedia.org) for kernel update
* 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 13:15 moritzm: installing ffmpeg security updates
* 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
* 13:11 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
* 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1089', diff saved to https://phabricator.wikimedia.org/P11830 and previous config saved to /var/cache/conftool/dbconfig/20200709-131039-marostegui.json
* 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 13:08 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:17 moritzm: installing intel-microcode security updates on stretch
* 13:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
* 13:05 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for [[phab:T286679|T286679]] (duration: 04m 45s)
* 13:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for [[phab:T286679|T286679]]
* 12:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:40 papaul: powerdown ms-be2038 for BBU replacement
* 12:57 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
* 12:57 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]] (duration: 00m 09s)
* 12:56 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # [[phab:T280197|T280197]]
* 12:56 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
* 12:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
* 12:54 moritzm: rebooting install* servers for kernel security update
* 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 [[phab:T287036|T287036]]
* 12:43 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
* 12:40 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
* 12:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
* 12:38 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
* 12:38 moritzm: rebooting urldownloader1001/2001 for kernel update (failed over, these are now the inactive ones)
* 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
* 12:23 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99)
* 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
* 12:22 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
* 12:22 moritzm: rebooting dbmonitor1001 / tendril.wikimedia.org for kernek update
* 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
* 12:11 XioNoX: enable asw2-b-eqiad:ae3 (to cloudsw1-c8) - [[phab:T251632|T251632]]
* 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
* 11:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
* 11:54 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
* 11:52 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:50 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
* 11:50 moritzm: rebooting debmonitor1001 for kernel update
* 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
* 11:42 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Translate/tag/SpecialPageTranslation.php: {{Gerrit|6541d3ff51f52fe8a1bdbfa86022f8d97d6c7680}}: DeprecatablePropertyArray: Use MW_VERSION instead of array_key_exists ([[phab:T257531|T257531]]) (duration: 01m 05s)
* 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
* 11:28 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|3a7c1c33e58637437f819edf039008a00dc5be27}}: Rename namespace on kn.wikipedia.org ([[phab:T255337|T255337]]) (duration: 01m 04s)
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6699dae1e96b38b4fae7e8b9817d84b56d2be6c}}: GrowthExperiments: Add more wikis to linkrecommendation experiment ([[phab:T284481|T284481]]) (duration: 01m 31s)
* 11:24 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a3c1f94a702b527842ed4f34d8bf41b26235e64}}: Add *.oireachtas.ie to the wgCopyUploadsDomains whitelist for commonswiki ([[phab:T256543|T256543]]) (duration: 01m 04s)
* 10:50 moritzm: installing systemd security updates on bullseye
* 11:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
* 11:17 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:14 effie: enable puppet on mw* servers
* 11:10 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - [[phab:T287038|T287038]]
* 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
* 09:34 jynus: restart db2097 [[phab:T287072|T287072]]
* 11:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|e6f442c6900524482806aeb1b5162e65bf7c97ac}}: Enable Quicksurveys for Desktop Improvements Project ([[phab:T246977|T246977]]) (duration: 01m 06s)
* 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
* 11:01 vgutierrez: restart ats-tls on cp1085
* 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:55 _joe_: restarting php7.2-fpm on mw1282, workers failing with sigill
* 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]] (duration: 45m 51s)
* 10:54 _joe_: depool mw1282
* 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
* 10:54 mvolz@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:34 mvolz@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 08:31 godog: upgrade karma on alert hosts - [[phab:T284213|T284213]]
* 10:23 _joe_: rolling restart the remaining restbases in eqiad, and all of codfw
* 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 10:22 mvolz@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 [[phab:T281058|T281058]]
* 10:09 _joe_: restarting restbase on rb1020-22
* 08:17 effie: enable puppet on alert*
* 09:53 _joe_: restarting restbase on restbase1024,1023
* 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # [[phab:T281156|T281156]]
* 09:36 _joe_: restarting restbase on rb1026,1027 to switch to proton on k8s
* 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
* 09:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
* 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 09:28 _joe_: restarting restbase on restbase1025 to pick up the switch to k8s of proton
* 07:56 XioNoX: push extra sampling on cr2-eqiad - [[phab:T286038|T286038]]
* 09:27 godog: bounce thanos-compact on thanos-fe2001
* 07:44 XioNoX: push extra sampling on cr1-eqiad - [[phab:T286038|T286038]]
* 09:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 07:38 XioNoX: update RIS peer IP on cr2-codfw
* 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P11828 and previous config saved to /var/cache/conftool/dbconfig/20200709-085228-marostegui.json
* 07:16 godog: powercycle ms-be2048
* 08:44 marostegui: Stop haproxy on dbproxy1017 before upgrading to buster - [[phab:T255408|T255408]]
* 07:03 moritzm: installing systemd security updates on stretch
* 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1136', diff saved to https://phabricator.wikimedia.org/P11827 and previous config saved to /var/cache/conftool/dbconfig/20200709-082355-marostegui.json
* 06:51 effie: restart memcached on eqiad mc* hosts
* 08:23 moritzm: imported osm2pgsql 0.96.0+ds-1~bpo9+1 to "main" component [[phab:T256877|T256877]]
* 06:51 effie: enable puppet on mc* hosts
* 08:22 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 06:35 effie: disable puppet on mc1* hosts and icinga - [[phab:T271967|T271967]]
* 08:20 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:13 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 08:11 XioNoX: disable igmp snooping on msw1-codfw
* 07:59 marostegui: Stop db1117:3322 to clone db1084, this will trigger haproxy alerts - [[phab:T257540|T257540]]
* 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1136', diff saved to https://phabricator.wikimedia.org/P11825 and previous config saved to /var/cache/conftool/dbconfig/20200709-075749-marostegui.json
* 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 06:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11824 and previous config saved to /var/cache/conftool/dbconfig/20200709-053905-marostegui.json
* 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1084 from dbctl', diff saved to https://phabricator.wikimedia.org/P11823 and previous config saved to /var/cache/conftool/dbconfig/20200709-053206-marostegui.json
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11822 and previous config saved to /var/cache/conftool/dbconfig/20200709-051826-marostegui.json
* 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P11821 and previous config saved to /var/cache/conftool/dbconfig/20200709-051355-marostegui.json
* 05:11 marostegui: Remove revision triggers from db2093:3315 [[phab:T238966|T238966]]
* 05:10 marostegui: Deploy schema change on s5 codfw, lag will be generated - [[phab:T238966|T238966]]
* 01:43 tzatziki: reset email for GseSro
* 00:58 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'enable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'
* 00:49 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕘🍺 sudo cumin A:cp 'disable-puppet "cdanis deploying {{Gerrit|I6c1b646e}} [[phab:T256395|T256395]]"'


== 2020-07-08 ==
== 2021-07-20 ==
* 21:56 mutante: deleting files from releases2001 that are not existing on releases1001 to make them mirrors. rsync with --delete and the command from quickdatacopy class ([[phab:T247652|T247652]])
* 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|caa5a076f39b051b01622aa3e4c9d716a8643eef}}: Set wgGEMentorDashboardBackendEnabled properly ([[phab:T285811|T285811]]) (duration: 00m 57s)
* 21:55 mutante: rsyncing releases files from releases1001 to releases2002 and releases1002. deleting files from releases2002 not existing on releases1002 to make them mirrors ( [[phab:T247652|T247652]]_
* 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|dafd953eb5cd35bddbd2fd348b03066420a42362}}: updateMenteeData: Make it possible to disable script per-wiki ([[phab:T285811|T285811]]) (duration: 00m 58s)
* 20:59 cstone: civicrm revision changed from {{Gerrit|d73ee2e73f}} to {{Gerrit|8b09c87ce2}},
* 18:57 urbanecm: Start server-side upload for 4 large PNG files ([[phab:T285708|T285708]])
* 20:27 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
* 20:08 Amir1_: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https ([[phab:T256012|T256012]])
* 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 19:18 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]] (duration: 01m 04s)
* 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
* 19:17 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 17:06 rzl: enabled puppet on A:mw
* 18:58 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|091442cf035a6d76f1211291afbb3193c513595d}}: Add *.nga.gov to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T256518|T256518]]) (duration: 01m 04s)
* 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 18:55 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|2e5943ddb30e08607a9ffb6ed05a042e8367e2e1}}: Add scan-bugs.org to $wgCopyUploadsDomains ([[phab:T256569|T256569]]) (duration: 01m 04s)
* 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
* 18:46 urbanecm@deploy1001: Synchronized static/images/project-logos/: {{Gerrit|f42cdf2}}: Change bnwiki logo ([[phab:T255328|T255328]]) (duration: 01m 04s)
* 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
* 18:27 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Cleanup: remove temporary wmgDisableHTCP variable gerrit:607596 [[phab:T250781|T250781]] IS.php (duration: 01m 01s)
* 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 18:20 ppchelko@deploy1001: Synchronized wmf-config/CommonSettings.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] CS.php (duration: 01m 03s)
* 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
* 18:18 ppchelko@deploy1001: Synchronized wmf-config/wikitech.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] wikitech.php (duration: 01m 04s)
* 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:17 ppchelko@deploy1001: Synchronized wmf-config/reverse-proxy.php: Disable HTCP purging everywhere gerrit:607593 [[phab:T250781|T250781]] reverse-proxy.php (duration: 01m 04s)
* 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
* 18:11 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]], IS.php (duration: 01m 03s)
* 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 18:04 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add wgEventServiceDefault to refactor EventBus event stream config gerrit:610160 [[phab:T229863|T229863]] (duration: 01m 04s)
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
* 17:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
* 17:16 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
* 17:16 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
* 17:08 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
* 16:57 _joe_: restarting restbase across the fleet to transition to using envoy
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
* 16:40 _joe_: restarting restbase on restbase2010 to route calls to mediawiki, parsoid via envoy
* 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:23 vgutierrez: pool dns1002 - [[phab:T286069|T286069]]
* 16:32 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 16:27 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
* 15:22 jgleeson: updated fundraising-tools from {{Gerrit|a244e0e85f}} --> {{Gerrit|f5b8528214}}
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
* 15:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
* 15:12 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
* 15:12 moritzm: rebooting people1002 (people.wikimedia.org) for kernel security update
* 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 [[phab:T281058|T281058]]
* 15:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:53 urbanecm: Start server-side upload for 7 large PNG files ([[phab:T285708|T285708]])
* 14:48 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
* 14:46 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:46 moritzm: installing isc-dhcp security updates
* 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
* 14:31 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.change-distro (exit_code=99)
* 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:31 moritzm: installing gdk-pixbuf security updates
* 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
* 14:26 _joe_: repooling mw1346
* 14:46 vgutierrez: depool dns1002 - [[phab:T286069|T286069]]
* 14:24 _joe_: php7adm /opcache-free on mw1346
* 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:15 jbond42: switch icinga authentication to CAS SSO
* 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
* 14:12 _joe_: depooling mw1346
* 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - [[phab:T286069|T286069]]
* 14:12 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:11 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 [[phab:T281058|T281058]]
* 14:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 14:04 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 14:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 14:04 moritzm: rebooting idp-test1001 for kernel update
* 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 13:59 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.stop-cluster (exit_code=97)
* 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 [[phab:T281058|T281058]]
* 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.change-distro (exit_code=0)
* 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 13:31 jynus: replacing ssh key for ci_docroot at deploy1001
* 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 13:31 moritzm: imported git 2.20.1-2+deb10u3~wmf1 for stretch-wikimedia component/git [[phab:T257308|T257308]]
* 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
* 13:10 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:07 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
* 13:00 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:41 marostegui: Deploy schema change on s7 codfw, lag is expected
* 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 [[phab:T281058|T281058]]
* 12:17 xionox-tmp: rollout less frequent option-refresh-rate - [[phab:T240658|T240658]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
* 12:01 xionox-tmp: renumber eqiad NTT link - [[phab:T254877|T254877]]
* 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
* 11:42 awight: EU BACON complete
* 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
* 11:41 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610234{{!}}Undeploy graphoid for phase 1 wikis (T257402)]] (duration: 01m 03s)
* 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10{{!}}0[1-9]).codfw.wmnet
* 11:31 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610268{{!}}Add nature.com to commonswiki wgCopyUploadDomains (T254342)]] (duration: 01m 03s)
* 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 11:29 moritzm: installing freetype security updates
* 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 [[phab:T281058|T281058]]
* 11:26 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609991{{!}}[hiwikibooks] Translate sitename for hi.wikibooks (T256587)]] (duration: 01m 03s)
* 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - [[phab:T285643|T285643]]
* 11:19 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:609990{{!}}[arwiki] Grant 'patrolmarks' to all (T257106)]] (duration: 01m 04s)
* 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 11:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'proton' for release 'production' .
* 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 [[phab:T281058|T281058]]
* 11:18 moritzm: installing libgcrypt20 security updates
* 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
* 11:16 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'proton' for release 'production' .
* 12:44 moritzm: installing systemd security updates on buster
* 11:07 awight@deploy1001: Synchronized wmf-config/InitialiseSettings.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 2/2 (duration: 01m 03s)
* 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
* 11:06 awight@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BACON: [[gerrit:610056{{!}}Provision WMDE TeWü survey for prototype 1 (T257306)]], file 1/2 (duration: 01m 16s)
* 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
* 11:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'proton' for release 'production' .
* 11:58 Lucas_WMDE: EU config+backport window done
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121', diff saved to https://phabricator.wikimedia.org/P11818 and previous config saved to /var/cache/conftool/dbconfig/20200708-110546-marostegui.json
* 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (3/3) (duration: 00m 56s)
* 10:51 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
* 10:50 akosiaris: apply calico egress policies
* 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (2/3) (duration: 00m 56s)
* 10:50 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
* 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: [[gerrit:705505{{!}}Avoid using User::newFrom* methods]] (1/3) (duration: 00m 56s)
* 10:45 moritzm: installing json-c security updates
* 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 3/3) (duration: 00m 56s)
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P11817 and previous config saved to /var/cache/conftool/dbconfig/20200708-102553-marostegui.json
* 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 2/3) (duration: 00m 56s)
* 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1084', diff saved to https://phabricator.wikimedia.org/P11816 and previous config saved to /var/cache/conftool/dbconfig/20200708-102500-marostegui.json
* 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: {{Gerrit|e52ae37dc2010ed2483328921a274e4934940791}}: otrs_wikiwiki: Update logo to use VRT instead of OTRS ([[phab:T280400|T280400]]; 1/3) (duration: 00m 57s)
* 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084', diff saved to https://phabricator.wikimedia.org/P11815 and previous config saved to /var/cache/conftool/dbconfig/20200708-101313-marostegui.json
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705498{{!}}Add patroller group for ckbwiki (T285221)]] (duration: 00m 57s)
* 09:58 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (2/2, beta) (duration: 00m 56s)
* 09:57 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:705107{{!}}Typo fix: "the the" -> "the" (T201491)]] (1/2, prod) (duration: 00m 57s)
* 09:56 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:704867{{!}}Update config for language switching on pilot wikis (T286459)]] (duration: 00m 59s)
* 09:50 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1149', diff saved to https://phabricator.wikimedia.org/P11814 and previous config saved to /var/cache/conftool/dbconfig/20200708-094539-marostegui.json
* 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149', diff saved to https://phabricator.wikimedia.org/P11813 and previous config saved to /var/cache/conftool/dbconfig/20200708-092650-marostegui.json
* 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P11812 and previous config saved to /var/cache/conftool/dbconfig/20200708-092627-marostegui.json
* 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:24 xionox-tmp: renumber eqord NTT link - [[phab:T254877|T254877]]
* 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:18 xionox-tmp: remove eqord-eqiad tunnel - [[phab:T254877|T254877]]
* 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
* 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P11811 and previous config saved to /var/cache/conftool/dbconfig/20200708-091557-marostegui.json
* 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
* 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1147', diff saved to https://phabricator.wikimedia.org/P11810 and previous config saved to /var/cache/conftool/dbconfig/20200708-085745-marostegui.json
* 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 [[phab:T281058|T281058]]
* 08:54 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
* 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
* 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
* 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
* 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P11809 and previous config saved to /var/cache/conftool/dbconfig/20200708-085024-marostegui.json
* 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
* 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074', diff saved to https://phabricator.wikimedia.org/P11808 and previous config saved to /var/cache/conftool/dbconfig/20200708-084227-marostegui.json
* 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
* 08:40 moritzm: upgrading docker on remaining buster hosts
* 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
* 08:38 hashar: Upgraded docker.io on contint1001 and contint2001
* 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
* 08:28 marostegui: Remove dbproxy1003 grants from misc hosts [[phab:T231280|T231280]]
* 03:17 eileen: civicrm revision changed from {{Gerrit|20e9ef6bbb}} to {{Gerrit|819c11307d}}, config revision is {{Gerrit|bb405c5232}}
* 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11807 and previous config saved to /var/cache/conftool/dbconfig/20200708-082624-marostegui.json
* 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P11806 and previous config saved to /var/cache/conftool/dbconfig/20200708-082040-marostegui.json
* 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11805 and previous config saved to /var/cache/conftool/dbconfig/20200708-081647-marostegui.json
* 08:15 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2020 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11804 and previous config saved to /var/cache/conftool/dbconfig/20200708-081519-kormat.json
* 08:00 marostegui: Failover m1 from db1097 to db1080 - [[phab:T256717|T256717]]
* 07:57 kormat: reimaging es2020 to buster [[phab:T257284|T257284]]
* 07:51 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P11803 and previous config saved to /var/cache/conftool/dbconfig/20200708-074939-marostegui.json
* 07:49 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 07:48 jynus: stop bacula-director on backup1001 in preparation for m1 switchover [[phab:T256717|T256717]]
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:47 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 07:47 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
* 07:45 moritzm: installing PHP 7.3 security updates
* 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1143', diff saved to https://phabricator.wikimedia.org/P11802 and previous config saved to /var/cache/conftool/dbconfig/20200708-073548-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P11801 and previous config saved to /var/cache/conftool/dbconfig/20200708-073037-marostegui.json
* 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1142', diff saved to https://phabricator.wikimedia.org/P11800 and previous config saved to /var/cache/conftool/dbconfig/20200708-073011-marostegui.json
* 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P11799 and previous config saved to /var/cache/conftool/dbconfig/20200708-072431-marostegui.json
* 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1141', diff saved to https://phabricator.wikimedia.org/P11798 and previous config saved to /var/cache/conftool/dbconfig/20200708-070921-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P11797 and previous config saved to /var/cache/conftool/dbconfig/20200708-070432-marostegui.json
* 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1138', diff saved to https://phabricator.wikimedia.org/P11796 and previous config saved to /var/cache/conftool/dbconfig/20200708-070403-marostegui.json
* 06:47 marostegui: start topology changes on m1 [[phab:T256717|T256717]]
* 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1138', diff saved to https://phabricator.wikimedia.org/P11795 and previous config saved to /var/cache/conftool/dbconfig/20200708-064354-marostegui.json
* 06:36 marostegui: Deploy schema change on s2 primary master db1122 [[phab:T238966|T238966]]
* 06:18 _joe_: rolling restart of restbase to pick up the proton url change
* 03:36 andrew@deploy1001: Finished deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130 (duration: 03m 44s)
* 03:32 andrew@deploy1001: Started deploy [horizon/deploy@505819d]: further fixes for proxy editing --bug 610130


== 2020-07-07 ==
== 2021-07-19 ==
* 22:41 mutante: new Wikimedia Annual Report 2019 now available on annual.wikimedia.org
* 20:48 urbanecm: Deploy security patch for [[phab:T286884|T286884]]
* 21:29 andrew@deploy1001: Finished deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130 (duration: 03m 35s)
* 20:29 vgutierrez: pool text@codfw - [[phab:T286921|T286921]]
* 21:25 andrew@deploy1001: Started deploy [horizon/deploy@fce8183]: further fixes for proxy editing --bug 610130
* 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:10 andrew@deploy1001: Finished deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130 (duration: 03m 26s)
* 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
* 21:07 andrew@deploy1001: Started deploy [horizon/deploy@abcd051]: further fixes for proxy editing --bug 610130
* 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: [[gerrit:705467{{!}}prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877)]] (duration: 00m 58s)
* 20:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2 (duration: 09m 15s)
* 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
* 20:32 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009, take 2
* 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: [[gerrit:705448{{!}}Add sanity check to newRevisionFromRowAndSlots. (T286877)]] (duration: 00m 57s)
* 20:32 ppchelko@deploy1001: Finished deploy [restbase/deploy@05b8bd5]: Remove restbase2009 (duration: 14m 28s)
* 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - [[phab:T286921|T286921]]
* 20:24 mutante: kubernetes1003 - starting nagios-nrpe-server
* 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - [[phab:T286921|T286921]]
* 20:23 mutante: kubernetes1001 - starting nagios-nrpe-server
* 18:46 brennen: gerrit1001: restarting gerrit
* 20:17 ppchelko@deploy1001: Started deploy [restbase/deploy@05b8bd5]: Remove restbase2009
* 18:40 vgutierrez: stop pybal on lvs2009  - [[phab:T286921|T286921]]
* 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 18:38 brennen: re-enabling puppet on gerrit1001]
* 19:27 mutante: destroying VM gerrit1002 - decom cookbook
* 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - [[phab:T286921|T286921]]
* 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P<nowiki>{</nowiki>relforge*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 19:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.40  refs [[phab:T256668|T256668]]
* 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
* 19:04 mutante: contint2001 - move /var/lib/zuul/.ssh/known_hosts to root and run puppet to recreate it
* 18:27 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P<nowiki>{</nowiki>cloudelastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:38 andrew@deploy1001: Finished deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130 (duration: 03m 18s)
* 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - [[phab:T286921|T286921]]
* 18:35 andrew@deploy1001: Started deploy [horizon/deploy@eaa056e]: fix for proxy editing --bug 610130
* 18:20 vgutierrez: enabling pybal on lvs2007 - [[phab:T286921|T286921]]
* 18:27 andrew@deploy1001: Finished deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies (duration: 03m 26s)
* 18:19 ryankemper: [[phab:T264053|T264053]] Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P<nowiki>{</nowiki>elastic*<nowiki>}</nowiki>' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
* 18:23 andrew@deploy1001: Started deploy [horizon/deploy@a39e86c]: update proxy UI to support editing existing proxies
* 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
* 18:10 krinkle@deploy1001: Synchronized w/: remove untracked test cookie file (duration: 01m 04s)
* 18:06 dancy@deploy1002: Synchronized .pipeline: Config: [[gerrit:705437{{!}}pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately]] (duration: 00m 56s)
* 18:08 krinkle@deploy1001: Synchronized php-1.35.0-wmf.40/includes/Revision/RevisionStore.php: {{Gerrit|I8f986daeab4}} (duration: 01m 05s)
* 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:59 herron: imported (logstash{{!}}kibana{{!}}elasticsearch)-oss-7.8.0 into buster-wikimedia thirdparty/elastic78
* 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 17:54 hnowlan: finished removing restbase2009 from cassandra pool
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
* 17:06 hnowlan: removed restbase2009-b from cassandra pool, removing restbase2009-c
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 16:40 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.40/extensions/Wikibase: Backport: [[gerrit:610086{{!}}Revert "Don’t load $wgWBClientSettings in WikibaseClient.php" (T257296)]] (duration: 01m 10s)
* 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
* 15:49 hnowlan: running nodetool removenode for restbase2009-a
* 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 15:38 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
* 15:27 elukey: root-tmux on cumin1001 - cumin 'c:profile::mediawiki::mcrouter_wancache' '/usr/local/sbin/restart-mcrouter' -b 2 -s 5 - roll restart of mw-mcrouter to pick up new settings - [[phab:T255511|T255511]]
* 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 15:13 hnowlan@deploy1001: Started restart [restbase/deploy@05b8bd5]: Restarting restbase after removal of restbase2009
* 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
* 15:12 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
* 15:09 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:30 volans: running puppet on elastic2038 after nework was restored
* 15:06 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 15:04 otto@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 15:02 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 15:01 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:58 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus (duration: 00m 04s)
* 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:58 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: Second deployment to ensure everything works fine. Thank you jynus
* 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:53 _joe_: restarted restbase on restbase2022 after removing restbase2009 from the cassandra seeds
* 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
* 14:48 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:23 volans: running authdns-update to force-update authdns2001
* 14:47 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:38 otto@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 17:21 XioNoX: remove ns1 redirect - [[phab:T286787|T286787]]
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
* 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
* 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:30 papaul: replacing msw-a5,a6,a7 and a8
* 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
* 14:30 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
* 17:10 XioNoX: enable asw-a2-codfw access ports - [[phab:T286787|T286787]]
* 14:24 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
* 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - [[phab:T286787|T286787]]
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
* 14:20 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
* 14:16 hashar@deploy1001: Finished deploy [integration/docroot@708d3eb]: (no justification provided) (duration: 00m 09s)
* 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
* 14:16 hashar@deploy1001: Started deploy [integration/docroot@708d3eb]: (no justification provided)
* 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
* 13:38 _joe_: rolling restart of restbase to pick up using envoy
* 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
* 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
* 13:31 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
* 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
* 13:29 XioNoX: cr2-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 16:40 XioNoX: update asw-a2-codfw serial number - [[phab:T286787|T286787]]
* 13:24 XioNoX: cr1-eqiad> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Promote es2021 to es4 master [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11789 and previous config saved to /var/cache/conftool/dbconfig/20200707-131524-kormat.json
* 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
* 12:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:57 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
* 12:44 kormat: starting (codfw) es5 failover from es2020 to es2021 [[phab:T257284|T257284]]
* 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Set es2021 to weight 50 [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11787 and previous config saved to /var/cache/conftool/dbconfig/20200707-123003-kormat.json
* 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
* 12:12 jforrester@deploy1001: Finished scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]] (duration: 33m 09s)
* 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 12:01 marostegui: Deploy schema change on labswiki (wikitech) master - [[phab:T253276|T253276]]
* 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
* 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1082', diff saved to https://phabricator.wikimedia.org/P11786 and previous config saved to /var/cache/conftool/dbconfig/20200707-115838-marostegui.json
* 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:39 jforrester@deploy1001: Started scap: Full scap and testwikis to 1.35.0-wmf.40 for [[phab:T256668|T256668]]
* 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
* 11:38 jforrester@deploy1001: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "jforrester"; reason is "testwikis wikis to 1.35.0-wmf.40" (duration: 00m 00s)
* 16:21 mutante: depooled logstash2021 for dcops maintenance work
* 11:33 moritzm: installing PHP 7.0 security updates
* 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
* 11:29 marostegui: Deploy schema change on db1082, this will create lag on s5 labs
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
* 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082', diff saved to https://phabricator.wikimedia.org/P11784 and previous config saved to /var/cache/conftool/dbconfig/20200707-112926-marostegui.json
* 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
* 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1130', diff saved to https://phabricator.wikimedia.org/P11783 and previous config saved to /var/cache/conftool/dbconfig/20200707-112830-marostegui.json
* 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
* 11:26 godog: test bumping logstash7 batch size to 256
* 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
* 11:17 moritzm: prune PHP 7.0 packages from mwdebug1001/2001/2002
* 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|310be45f7}} (duration: 00m 57s)
* 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P11782 and previous config saved to /var/cache/conftool/dbconfig/20200707-110506-marostegui.json
* 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
* 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1110', diff saved to https://phabricator.wikimedia.org/P11781 and previous config saved to /var/cache/conftool/dbconfig/20200707-110412-marostegui.json
* 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
* 10:57 moritzm: prune PHP 7.0 packages from mw2190-mw2214
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
* 10:46 jforrester@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.40
* 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
* 10:44 jforrester@deploy1001: Pruned MediaWiki: 1.35.0-wmf.38 (duration: 17m 23s)
* 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|I2bdfbd258e}} (duration: 00m 57s)
* 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P11780 and previous config saved to /var/cache/conftool/dbconfig/20200707-103255-marostegui.json
* 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11779 and previous config saved to /var/cache/conftool/dbconfig/20200707-102757-marostegui.json
* 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|I069c7b53}} (duration: 00m 58s)
* 10:26 moritzm: prune PHP 7.0 packages from mw2135-mw2147
* 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:12 addshore@deploy1001: Synchronized wmf-config/config/testcommonswiki.yaml: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT2/2 (duration: 00m 55s)
* 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 10:11 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 00s)
* 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
* 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P11778 and previous config saved to /var/cache/conftool/dbconfig/20200707-101043-marostegui.json
* 15:10 godog: +100G to prometheus/ops in codfw
* 10:10 addshore@deploy1001: Synchronized dblists/wikidataclient-test.dblist: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 56s)
* 14:59 vgutierrez: rolling restart of eqiad pybal instances
* 10:08 addshore@deploy1001: sync-file aborted: [[gerrit:609985]] Make testcommonswiki a testwikidata client [[phab:T257266|T257266]] PT1/2 (duration: 00m 36s)
* 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 10:06 elukey: decommission archiva1001
* 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
* 10:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 14:42 vgutierrez: rolling restart of codfw pybal instances
* 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11777 and previous config saved to /var/cache/conftool/dbconfig/20200707-100328-marostegui.json
* 14:33 vgutierrez: rolling restart of eqsin pybal instances
* 10:03 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
* 14:23 vgutierrez: rolling restart of ulsfo pybal instances
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 10:03 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P11776 and previous config saved to /var/cache/conftool/dbconfig/20200707-095443-marostegui.json
* 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11775 and previous config saved to /var/cache/conftool/dbconfig/20200707-095428-marostegui.json
* 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 09:42 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [[gerrit:609971]] [[phab:T257266|T257266]] Enable sitelinks to testcommons from test wikidata sites (duration: 00m 56s)
* 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
* 09:40 kormat@cumin1001: dbctl commit (dc=all): 'Repool es2021 after reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11774 and previous config saved to /var/cache/conftool/dbconfig/20200707-094017-kormat.json
* 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 09:37 addshore@deploy1001: Synchronized wmf-config: [[gerrit:609986]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: Remove config option wmgUseEntitySourceBasedFederation (take2) (duration: 00m 57s)
* 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 09:36 _joe_: errata: restbase2010, not 2009
* 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
* 09:36 _joe_: applying the new configuration using the service proxy to restbase2009 too
* 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
* 09:34 godog: bounce logstash on logstash1023
* 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 09:33 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: [[gerrit:609645]] [[phab:T257266|T257266]] [[phab:T241975|T241975]] Wikibase: stop using wmgUseEntitySourceBasedFederation (take2) (duration: 00m 59s)
* 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
* 09:33 _joe_: depooling restbase1025 while we fix the troubled relationship between envoy and proton
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
* 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P11773 and previous config saved to /var/cache/conftool/dbconfig/20200707-093345-marostegui.json
* 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
* 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Remove weight from es1024 as it is the current master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11772 and previous config saved to /var/cache/conftool/dbconfig/20200707-092635-marostegui.json
* 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 09:24 James_F: 1.35.0-wmf.40 was branched at {{Gerrit|88ecd6df00a46e432c06c1cf40d5098128abc4d8}} for [[phab:T256668|T256668]]
* 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
* 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11771 and previous config saved to /var/cache/conftool/dbconfig/20200707-092357-marostegui.json
* 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
* 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11770 and previous config saved to /var/cache/conftool/dbconfig/20200707-091015-marostegui.json
* 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
* 08:33 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
* 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11769 and previous config saved to /var/cache/conftool/dbconfig/20200707-083144-marostegui.json
* 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
* 08:30 kormat@cumin2001: START - Cookbook sre.hosts.downtime
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
* 08:26 XioNoX: cr2-codfw> request vmhost snapshot routing-engine both - [[phab:T257153|T257153]]
* 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
* 08:22 XioNoX: cr2-eqsin> request vmhost snapshot - [[phab:T257153|T257153]]
* 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
* 08:19 XioNoX: cr2-eqord> request vmhost snapshot - [[phab:T257153|T257153]]
* 11:40 moritzm: installing bluez security updates
* 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1023 after reimage [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11768 and previous config saved to /var/cache/conftool/dbconfig/20200707-081909-marostegui.json
* 11:31 Lucas_WMDE: EU backport+config window done
* 08:18 elukey@cumin1001: END (ERROR) - Cookbook sre.hadoop.change-distro (exit_code=97)
* 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:703205{{!}}Add config for updated PropertySuggester beta cluster (T285098)]] (beta-only) (duration: 00m 57s)
* 08:17 XioNoX: cr2-eqdfw> request vmhost snapshot - [[phab:T257153|T257153]]
* 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 08:15 XioNoX: cr3-knams> request vmhost snapshot - [[phab:T257153|T257153]]
* 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 08:15 hashar: upgrading and restart CI Jenkins on contint2001 # [[phab:T256978|T256978]]
* 09:52 moritzm: imported megacli for bullseye-wikimedia [[phab:T282272|T282272]] [[phab:T275873|T275873]]
* 08:12 XioNoX: cr4-ulsfo> request vmhost snapshot - [[phab:T257153|T257153]]
* 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
* 08:09 kormat@cumin1001: dbctl commit (dc=all): 'Depool es2021 for reimaging [[phab:T257284|T257284]]', diff saved to https://phabricator.wikimedia.org/P11767 and previous config saved to /var/cache/conftool/dbconfig/20200707-080914-kormat.json
* 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
* 07:50 marostegui: Stop MySQL on db1074 to deploy schema change and remove triggers - [[phab:T238966|T238966]]
* 08:15 vgutierrez: depool codfw text traffic
* 07:45 _joe_: restarting restbase again on rb1025
* 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
* 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 for schema change', diff saved to https://phabricator.wikimedia.org/P11766 and previous config saved to /var/cache/conftool/dbconfig/20200707-074435-marostegui.json
* 03:26 twentyafterfour: restarted phd on phab1001
* 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 03:25 twentyafterfour: investigating PHD failure
* 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079 and db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11765 and previous config saved to /var/cache/conftool/dbconfig/20200707-073918-marostegui.json
* 07:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
* 07:31 _joe_: restarting restbase on restbase1025, reaching proton via envoy for now
* 07:31 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (forgot to git rebase so the last sync was a no-op) (duration: 00m 56s)
* 07:27 elukey@cumin1001: START - Cookbook sre.hadoop.change-distro
* 07:27 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609644{{!}}Revert "Commons: Define entity sources configuration" (T256906, T256907, T256909, T254315, T257266)]] (duration: 00m 53s)
* 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11764 and previous config saved to /var/cache/conftool/dbconfig/20200707-072703-marostegui.json
* 07:24 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 07:23 lucaswerkmeister-wmde@deploy1001: Synchronized dblists/wikidataclient.dblist: Config: [[gerrit:609643{{!}}Revert "Wikidata client wikis: Define entity sources configuration (take 2)" (T254315, T257266)]] (duration: 00m 56s)
* 07:19 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:609642{{!}}Revert "Wikibase: stop using wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 55s)
* 07:16 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 07:15 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:609641{{!}}Revert "Wikibase: Remove config option wmgUseEntitySourceBasedFederation" (T241975, T257266)]] (duration: 00m 57s)
* 07:10 _joe_: restart restbase on restbase1025 to pick up the switch to https for cxserver
* 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 and give more main weight to db1136  [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11762 and previous config saved to /var/cache/conftool/dbconfig/20200707-063737-marostegui.json
* 06:29 marostegui: Reimage es1023 to Buster [[phab:T255755|T255755]]
* 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Give db1136 some weight back into main traffic [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11761 and previous config saved to /var/cache/conftool/dbconfig/20200707-062008-marostegui.json
* 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079 [[phab:T257216|T257216]]', diff saved to https://phabricator.wikimedia.org/P11760 and previous config saved to /var/cache/conftool/dbconfig/20200707-061849-marostegui.json
* 05:26 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Enable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 entirely [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11759 and previous config saved to /var/cache/conftool/dbconfig/20200707-051620-marostegui.json
* 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 master [[phab:T255755|T255755]]', diff saved to https://phabricator.wikimedia.org/P11758 and previous config saved to /var/cache/conftool/dbconfig/20200707-051236-marostegui.json
* 05:02 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Disable es5 writes [[phab:T255755|T255755]] (duration: 00m 56s)
* 05:01 marostegui: "Starting es failover from es1023 to es1024 - https://phabricator.wikimedia.org/T255755"
* 01:05 ejegg: turned on debug logging for Adyen SmashPig
* 00:22 cstone: civicrm revision changed from {{Gerrit|a48caf0f37}} to {{Gerrit|d73ee2e73f}}


== 2020-07-06 ==
== 2021-07-16 ==
* 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable sidebar instrumentation on test wikipedia (duration: 00m 56s)
* 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:32 eileen: process-control config revision is {{Gerrit|3fe6753e56}}
* 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 23:22 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Change some zh canonical namespaces. Don
* 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
* 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
* 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
* 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P<nowiki>{</nowiki>elastic2*<nowiki>}</nowiki>' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
* 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
* 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
* 15:48 vgutierrez: restart pybal on lvs2010
* 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
* 15:24 godog: downtime flappy pages in codfw for 40 minutes
* 15:14 godog: set alert2001 as active in netbox (was staged) - [[phab:T247966|T247966]]
* 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
* 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
* 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw [[phab:T286787|T286787]]
* 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw


== 2020-07-05 ==
== 2021-07-05 ==
* 21:50 qchris: Restarting gerrit on gerrit1001 to pick up new war and jars.
* 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image ([[phab:T286212|T286212]])
* 21:50 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001 (duration: 00m 07s)
* 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
* 21:50 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1001
* 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
* 21:46 qchris: Restarting gerrit on gerrit2001 to pick up new war and jars.
* 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 [[phab:T286042|T286042]]
* 21:45 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001 (duration: 00m 10s)
* 11:29 moritzm: installing openexr security updates on stretch
* 21:45 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13, zuul plugin to master-0-g7accc67, and gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit2001
* 11:07 moritzm: installing tiff security updates on stretch
* 21:32 qchris: Restarting gerrit on gerrit1002 to pick up new wars and jars.
* 10:48 moritzm: upgrading PHP on miscweb*
* 21:32 qchris@deploy1001: Finished deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67 (duration: 00m 08s)
* 10:37 jbond: enable puppet  fleet wide to post puppetdb change
* 21:32 qchris@deploy1001: Started deploy [gerrit/gerrit@fbd0684]: Bump gerrit to 3.2.2-102-g3bbb138e13 and zuul plugin to master-0-g7accc67
* 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication [[phab:T286102|T286102]]
* 21:20 qchris: Enable puppet on gerrit1002 (gerrit-test) again to let it catch up again
* 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
* 16:01 gehel: restart elastic-psi on elastic1052 (high GC rate)
* 08:15 moritzm: rolling out debmonitor-client 0.3.0
* 15:56 gehel: restart blazegraph + updater on wdqs1007 and depool to allow catching up on lag
* 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
* 07:04 _joe_: restarting blazegraph, then restarting the updater again
* 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
* 06:47 _joe_: restart wdqs-updater on wdqs1007
* 00:53 eileen: process-control config revision is {{Gerrit|a1717c7fde}}
* 00:47 eileen: process-control config revision is {{Gerrit|24565578f7}}


== 2020-07-04 ==
== 2021-07-04 ==
* 19:23 qchris@deploy1001: Finished deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002 (duration: 00m 08s)
* 17:43 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957{{!}}Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)
* 19:23 qchris@deploy1001: Started deploy [gerrit/gerrit@b78914b]: Bump gitiles to v3.2.2-1-g00c5ca0-with-0e3b533 on gerrit1002
* 08:02 elukey: repool eqsin after equinix maintenance - [[phab:T286113|T286113]]
* 14:05 qchris: Disable puppet on gerrit1002 (gerrit-test) to deploy Gerrit UI updates there to gather feedback
* 12:42 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 24s)
* 02:28 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/includes/Score.php: Short circuit lilypond version check to allow usage of cached files [[phab:T257066|T257066]] (duration: 00m 55s)


== 2020-07-03 ==
== 2021-07-03 ==
* 21:49 reedy@deploy1001: Synchronized php-1.35.0-wmf.39/extensions/Score/: Sync maintenance script (duration: 00m 58s)
* 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - [[phab:T286113|T286113]]
* 18:47 cdanis: ✔️ cdanis@an-coord1001.eqiad.wmnet ~ 🕒☕ sudo systemctl restart hive-server2.service
* 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for [[phab:T283659|T283659]]
* 16:51 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: {{Gerrit|Ifa929b2ad4}} (duration: 00m 57s)
* 08:53 Amir1: patching postorius and mailmanclient on lists1001 for [[phab:T283659|T283659]]
* 16:02 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Rename wgRestrictionMethod to wgShellRestrictionMethod (duration: 00m 58s)
* 15:46 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
* 15:43 jynus@cumin1001: dbctl commit (dc=all): 'Reduce db1118 weight to spread load mode evenly', diff saved to https://phabricator.wikimedia.org/P11730 and previous config saved to /var/cache/conftool/dbconfig/20200703-154337-jynus.json
* 15:40 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:38 jayme@cumin1001: START - Cookbook sre.ganeti.makevm
* 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.hadoop.stop-cluster (exit_code=0)
* 15:02 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 14:11 elukey@cumin1001: END (FAIL) - Cookbook sre.hadoop.stop-cluster (exit_code=99)
* 14:11 _joe_: restarted php-fpm on wtp1033, stuck in sigill
* 13:59 elukey@cumin1001: START - Cookbook sre.hadoop.stop-cluster
* 12:41 hashar: Restarting Zuul / CI
* 11:39 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:36 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 11:29 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 11:29 moritzm: rebooting urldownloader standby hosts for kernel updates (1002/2002)
* 10:59 moritzm: installing json-c security updates on jessie
* 10:51 moritzm: installing ruby-json security updates
* 10:25 moritzm: installing nss security updates on jessie
* 10:18 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 10:15 elukey: notebook1004 renamed to an-scheduler1001
* 10:15 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 10:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 10:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime
* 09:06 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 09:04 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 09:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:58 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:58 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:56 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:55 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:51 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:47 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:43 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:43 moritzm: rebooting netflow* hosts for kernel security update
* 08:16 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 08:14 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 08:04 jayme: authdns-update for chartmuseum - [[phab:T256970|T256970]]
* 08:03 elukey@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 07:55 moritzm: installing mutt security updates for jessie (stretch/buster already fixed)
* 07:44 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 07:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
* 07:39 elukey@cumin1001: START - Cookbook sre.hosts.decommission
* 07:00 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 07:00 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:47 moritzm: installing php5 security updates
* 06:27 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 06:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 06:09 moritzm: rebooting mw1390-mw1419 for kernel security updates
* 05:46 XioNoX: remove chassis redundancy failover from fasw-c-eqiad for consistency with all other VCs
* 05:33 XioNoX: remove chassis redundancy failover from fasw-c-codfw for consistency with all other VCs


== 2020-07-02 ==
== 2021-07-02 ==
* 23:22 jhuneidi@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 22:06 foks: removing three files for legal compliance
* 23:16 jhuneidi@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 22:03 jhuneidi@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:56 mutante: gerrit1001 (prod gerrit) - restarting gerrit service
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:52 maryum: frwikibooks reindex sucessful, continuing on with remainder of french wikis
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:32 mutante: gerrit - deleted gerrit db_pass from prod private repo, running puppet
* 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 21:25 mutante: gerrit2001 - restarted gerrit
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 21:14 mutante: gerrit1002 restarted gerrit
* 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:20 maryum: reindexing frwikibooks to test https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/604221
* 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:52 mutante: gerrit2001 - restarting gerrit after removing db_pass from config
* 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0)
* 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
* 15:37 jmm@cumin2001: START - Cookbook sre.hosts.reboot-single
* 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
* 15:19 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 15:07 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
* 15:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 14:42 moritzm: rebooting mw1370-mw1389 for kernel security updates
* 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
* 14:33 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
* 14:33 jmm@cumin2001: START - Cookbook sre.hosts.downtime
* 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
* 14:03 kormat: stopped mariadb@s8 on dbstore1005 for data restoration [[phab:T256966|T256966]]
* 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
* 12:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
* 14:39 jelto@cumin1001: