You are browsing a read-only backup copy of Wikitech. The primary site can be found at wikitech.wikimedia.org

Server Admin Log: Difference between revisions

From Wikitech-static
Jump to navigation Jump to search
imported>Stashbot
(hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master)
imported>Stashbot
(pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2435'])
 
(480 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== 2021-08-27 ==
== 2023-02-08 ==
* 16:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2435']
* 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2434']
* 14:50 akosiaris: stop flink on staging cluster to verify some IOPS starvation issues
* 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2435']
* 14:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
* 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2433']
* 14:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
* 01:00 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2434']
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 00:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2432']
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2433']
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
* 00:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2432']
* 14:44 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
* 00:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2431']
* 14:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2430']
* 14:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
* 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2431']
* 14:37 hnowlan@cumin1001: START - Cookbook sre.postgresql.postgres-init
* 00:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2430']
* 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2429']
* 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1005.eqiad.wmnet with reason: Resyncing from master
* 00:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2428']
* 13:48 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2429']
* 12:49 mutante: rsynced /srv/org/wikimedia/racktables from miscweb1002 to miscweb2002 ([[phab:T269746|T269746]])
* 00:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2427']
* 12:04 topranks: removing peering to Wave Division Holdings / AS11404 at Equinix Chicago cr2-eqord, AS no longer on exchange.
* 00:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2428']
* 10:56 akosiaris: sudo cumin 'mw*' 'ip ro ls dev docker0 && sysctl net.ipv4.ip_forward=0' to clear up the docker remnants of the dragonfly evaluation. [[phab:T286054|T286054]]
* 00:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mw2426']
* 10:31 godog: bounce logstash on logstash1007
* 00:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2427']
* 10:22 elukey: fallback codfw ores to rdb2007 after maintenance
* 00:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2426']
* 10:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
* 00:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2424']
* 10:12 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
* 00:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['mw2425']
* 09:49 elukey: restart ores uwsgi/celery workers to failover rdb2007 to rdb2008 (and ease the reboot of rdb2007
* 09:33 topranks: Running homer against mr1-ulsfo to force OOB interface to 100Mb/full-duplex - [[phab:T288343|T288343]]
* 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 09:25 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Update to expose int type from Netbox - cmooney@cumin1001
* 09:23 cmooney@deploy1002: Finished deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]] (duration: 01m 28s)
* 09:21 cmooney@deploy1002: Started deploy [homer/deploy@8183056]: Homer update exposing interface type from Netbox - [[phab:T288343|T288343]]
* 08:05 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 07:49 jayme: stopped kube-apiserver on kubestagemaster2001 for testing
* 07:49 jayme: stopped kube-apiserver on kubestage2001 for testing
* 07:00 godog: bounce logstash on logstash1008
* 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:41 tstarling@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/SecurePoll/cli/wm-scripts/sendMail.php: (no justification provided) (duration: 00m 56s)
* 06:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 legoktm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/PageTriage/: Revert backbone.js and underscore.js updates ([[phab:T289825|T289825]]) (duration: 01m 06s)


== 2021-08-26 ==
== 2023-02-07 ==
* 22:06 legoktm: restarted mailman3-web on lists1001 ([[phab:T289798|T289798]])
* 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2425']
* 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2424']
* 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2423']
* 19:02 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.20
* 23:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2422']
* 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2423']
* 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 23:32 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2422']
* 18:26 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2421']
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['mw2420']
* 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|66717bc039f40336144dcc0dfd97ff5331b418e9}}: Install Extension Quiz on ja.wikibooks ([[phab:T289383|T289383]]) (duration: 01m 05s)
* 23:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2421']
* 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mw2420']
* 18:16 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
* 18:16 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on durum1001.eqiad.wmnet with reason: testing out durum
* 23:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2435.mgmt.codfw.wmnet with reboot policy FORCED
* 18:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|cde88918b73628f2eaaff919ddb869b4dc2c93c6}}: Install Extension Quiz on fa.wikibooks ([[phab:T289381|T289381]]) (duration: 01m 07s)
* 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2434.mgmt.codfw.wmnet with reboot policy FORCED
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
* 18:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d4340e9c18468d14885c8ced87f1e014a3481f2a}}: Finalize Event Platform migration of EchoEmail and EchoInteraction ([[phab:T287210|T287210]]) (duration: 01m 07s)
* 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy FORCED
* 17:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy FORCED
* 17:38 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
* 17:30 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.20 (duration: 01m 05s)
* 22:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B8 - pt1979@cumin2002"
* 17:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
* 17:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.20
* 22:41 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:26 dancy@deploy1002: Synchronized php-1.37.0-wmf.20/includes/page/PageStore.php: Backport: [[gerrit:714864{{!}}PageStore: Pass query flags to getPageById() too (T289717 T195069)]] (duration: 01m 05s)
* 22:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
* 16:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2431.mgmt.codfw.wmnet with reboot policy FORCED
* 16:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:31 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2430.mgmt.codfw.wmnet with reboot policy FORCED
* 15:56 sukhe: ran homer for Gerrit 715007: Set up BGP peering to durum1001 in eqiad
* 22:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
* 15:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
* 15:40 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2429.mgmt.codfw.wmnet with reboot policy FORCED
* 14:24 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=plwiki --prune --batch-size=10 --sleep=2 ([[phab:T289249|T289249]])
* 22:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2428.mgmt.codfw.wmnet with reboot policy FORCED
* 13:19 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
* 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
* 22:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
* 13:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
* 22:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for new mw nodes in B6 - pt1979@cumin2002"
* 12:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
* 22:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 12:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 22:10 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - [[phab:T327970|T327970]]"
* 12:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 22:08 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:884333{{!}}Allow AbuseFilter to block IPs and users on itwikiversity (T328194)]] (duration: 08m 23s)
* 12:21 sukhe: running puppet initial run on durum1001.eqiad.wmnet - [[phab:T289536|T289536]]
* 22:07 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "provision new Ganeti VM an-airflow1005 - bking@cumin1001 - [[phab:T327970|T327970]]"
* 11:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:02 urbanecm@deploy1002: urbanecm and superpes: Backport for [[gerrit:884333{{!}}Allow AbuseFilter to block IPs and users on itwikiversity (T328194)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:00 urbanecm@deploy1002: Started scap: Backport for [[gerrit:884333{{!}}Allow AbuseFilter to block IPs and users on itwikiversity (T328194)]]
* 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:886983{{!}}Change the trwiki logo with a temporary one (old vector) (T329047)]] (duration: 10m 20s)
* 11:40 Lucas_WMDE: EU backport+config window done
* 21:51 urbanecm@deploy1002: superpes and urbanecm: Backport for [[gerrit:886983{{!}}Change the trwiki logo with a temporary one (old vector) (T329047)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:49 urbanecm@deploy1002: Started scap: Backport for [[gerrit:886983{{!}}Change the trwiki logo with a temporary one (old vector) (T329047)]]
* 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714853{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 04s)
* 21:48 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:886416{{!}}Install WikiLove extension on bnwikiquote (T328834)]] (duration: 15m 32s)
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:714854{{!}}Allow rendering of <nowiki><math>0</math></nowiki> (T288846)]] (duration: 01m 05s)
* 21:35 urbanecm@deploy1002: superpes and urbanecm: Backport for [[gerrit:886416{{!}}Install WikiLove extension on bnwikiquote (T328834)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 11:32 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum1001.eqiad.wmnet
* 21:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2051.codfw.wmnet with OS bullseye
* 11:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host durum1001.eqiad.wmnet
* 21:33 urbanecm: Create extension tables for Wikilove on bnwikiquote ([[phab:T328834|T328834]])
* 11:20 nikerabbit@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: [[gerrit:714770{{!}}Rename wgTranslateBlacklist to wgTranslateDisabledTargetLanguages]] (duration: 01m 05s)
* 21:33 urbanecm@deploy1002: Started scap: Backport for [[gerrit:886416{{!}}Install WikiLove extension on bnwikiquote (T328834)]]
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
* 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:31 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:887353{{!}}Disable languages on history page (T328996)]], [[gerrit:887351{{!}}Remove button styling from log in link (T289212)]], [[gerrit:887350{{!}}[followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717)]] (duration: 11m 10s)
* 10:09 vgutierrez: rolling restart of varnishkafka-statsv - [[phab:T289618|T289618]]
* 21:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1053.eqiad.wmnet with OS bullseye
* 10:07 vgutierrez: disable puppet on cp-text to merge {{Gerrit|I52cf2a573980e33487d1f05f19b192ae7d13d717}} - [[phab:T286038|T286038]]
* 21:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
* 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
* 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
* 10:01 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
* 21:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
* 09:36 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
* 21:22 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for [[gerrit:887353{{!}}Disable languages on history page (T328996)]], [[gerrit:887351{{!}}Remove button styling from log in link (T289212)]], [[gerrit:887350{{!}}[followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 09:30 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
* 21:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
* 09:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
* 21:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
* 09:21 elukey: elukey@kafka-main1001:~$ kafka acls --add --allow-principal User:CN=varnishkafka --producer --topic statsv - [[phab:T286038|T286038]]
* 21:20 urbanecm@deploy1002: Started scap: Backport for [[gerrit:887353{{!}}Disable languages on history page (T328996)]], [[gerrit:887351{{!}}Remove button styling from log in link (T289212)]], [[gerrit:887350{{!}}[followup] mediawiki.feedlink: Atom's link icon overlaps the link (T327717)]]
* 09:21 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
* 21:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
* 09:20 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
* 21:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2427.mgmt.codfw.wmnet with reboot policy FORCED
* 09:17 elukey: restart varnishkafka-statsv on cp4032 to pick up TLS settings
* 21:15 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
* 09:15 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
* 21:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2051.codfw.wmnet with reason: host reimage
* 09:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
* 21:12 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: host reimage
* 09:13 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
* 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2426.mgmt.codfw.wmnet with reboot policy FORCED
* 09:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
* 21:02 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventSreams - Fix android session schema path (duration: 07m 26s)
* 09:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
* 21:01 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1053.eqiad.wmnet with OS bullseye
* 08:52 vgutierrez: restart varnishkafka-statsv on cp4032
* 20:58 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2051.codfw.wmnet with OS bullseye
* 06:59 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 20:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2050.codfw.wmnet with OS bullseye
* 06:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1138.eqiad.wmnet with reason: REIMAGE
* 20:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1051.eqiad.wmnet with OS bullseye
* 06:48 godog: more weight to ms-be20[62-65] - [[phab:T288458|T288458]]
* 20:44 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1005.eqiad.wmnet
* 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P17085 and previous config saved to /var/cache/conftool/dbconfig/20210826-064655-marostegui.json
* 20:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
* 06:43 marostegui: Reimage s4 eqiad master (db1138), expect lag on eqiad [[phab:T288803|T288803]]
* 20:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2050.codfw.wmnet with reason: host reimage
* 06:37 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
* 06:33 elukey@cumin1001: START - Cookbook sre.dns.netbox
* 20:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: host reimage
* 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2050.codfw.wmnet with OS bullseye
* 20:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1051.eqiad.wmnet with OS bullseye
* 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
* 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 20:08 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
* 20:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 19:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 19:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1005.eqiad.wmnet on all recursors
* 19:57 bking@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1005.eqiad.wmnet on all recursors
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:57 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
* 19:56 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1005.eqiad.wmnet - bking@cumin1001"
* 19:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 19:55 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.22  refs [[phab:T325585|T325585]]
* 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 19:53 bking@cumin1001: START - Cookbook sre.dns.netbox
* 19:53 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1005.eqiad.wmnet
* 19:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2425.mgmt.codfw.wmnet with reboot policy FORCED
* 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2424.mgmt.codfw.wmnet with reboot policy FORCED
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
* 19:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
* 19:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
* 19:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
* 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
* 19:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
* 19:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2049.codfw.wmnet with OS bullseye
* 19:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1049.eqiad.wmnet with OS bullseye
* 19:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
* 19:20 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2049.codfw.wmnet with reason: host reimage
* 19:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
* 19:15 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: host reimage
* 19:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1049.eqiad.wmnet with OS bullseye
* 19:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2049.codfw.wmnet with OS bullseye
* 19:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2423.mgmt.codfw.wmnet with reboot policy FORCED
* 19:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2422.mgmt.codfw.wmnet with reboot policy FORCED
* 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
* 19:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2423,25,26,27 DNS - pt1979@cumin2002"
* 18:57 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2048.codfw.wmnet with OS bullseye
* 18:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1047.eqiad.wmnet with OS bullseye
* 18:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
* 18:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2048.codfw.wmnet with reason: host reimage
* 18:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
* 18:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: host reimage
* 18:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2048.codfw.wmnet with OS bullseye
* 18:17 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1047.eqiad.wmnet with OS bullseye
* 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 13 hosts
* 18:02 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 13 hosts
* 17:55 inflatador: bking@cumin1001 repooling elastic and wdqs hosts post-maintenance [[phab:T327925|T327925]]
* 17:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2047.codfw.wmnet with OS bullseye
* 17:51 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1046.eqiad.wmnet with OS bullseye
* 17:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
* 17:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2047.codfw.wmnet with reason: host reimage
* 17:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
* 17:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: host reimage
* 17:22 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1046.eqiad.wmnet with OS bullseye
* 17:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2047.codfw.wmnet with OS bullseye
* 16:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2046.codfw.wmnet with OS bullseye
* 16:48 urbanecm@deploy1002: Finished scap: {{Gerrit|58f4d877}}: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change ([[phab:T308017|T308017]]), {{Gerrit|854ff4ac}}: Finalize mediawiki/page/change schema at 1.0.0 ([[phab:T308017|T308017]]) (duration: 07m 32s)
* 16:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1045.eqiad.wmnet with OS bullseye
* 16:41 urbanecm@deploy1002: Started scap: {{Gerrit|58f4d877}}: Finalize mediawiki/page/change schema, produce at rc1.mediawiki.page_change ([[phab:T308017|T308017]]), {{Gerrit|854ff4ac}}: Finalize mediawiki/page/change schema at 1.0.0 ([[phab:T308017|T308017]])
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43765 and previous config saved to /var/cache/conftool/dbconfig/20230207-163902-root.json
* 16:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
* 16:31 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2046.codfw.wmnet with reason: host reimage
* 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
* 16:26 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: host reimage
* 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43764 and previous config saved to /var/cache/conftool/dbconfig/20230207-162357-root.json
* 16:18 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:886985{{!}}Restore mediawiki.page-undelete hook (T329064)]], [[gerrit:887346{{!}}Restore mediawiki.page-undelete hook (T329064)]] (duration: 17m 44s)
* 16:15 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2046.codfw.wmnet with OS bullseye
* 16:14 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1045.eqiad.wmnet with OS bullseye
* 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43763 and previous config saved to /var/cache/conftool/dbconfig/20230207-160852-root.json
* 16:02 urbanecm@deploy1002: urbanecm: Backport for [[gerrit:886985{{!}}Restore mediawiki.page-undelete hook (T329064)]], [[gerrit:887346{{!}}Restore mediawiki.page-undelete hook (T329064)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 16:00 urbanecm@deploy1002: Started scap: Backport for [[gerrit:886985{{!}}Restore mediawiki.page-undelete hook (T329064)]], [[gerrit:887346{{!}}Restore mediawiki.page-undelete hook (T329064)]]
* 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43762 and previous config saved to /var/cache/conftool/dbconfig/20230207-155347-root.json
* 15:53 moritzm: installing tiff security updates
* 15:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2045.codfw.wmnet with OS bullseye
* 15:47 urbanecm@deploy1002: Finished scap: {{Gerrit|20a79c55b7073e791e297a5389fa66819f596178}}: Don't add custom attributes in unwrapParsoidSections() ([[phab:T328268|T328268]]) (duration: 07m 34s)
* 15:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1043.eqiad.wmnet with OS bullseye
* 15:39 urbanecm@deploy1002: Started scap: {{Gerrit|20a79c55b7073e791e297a5389fa66819f596178}}: Don't add custom attributes in unwrapParsoidSections() ([[phab:T328268|T328268]])
* 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43761 and previous config saved to /var/cache/conftool/dbconfig/20230207-153842-root.json
* 15:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
* 15:29 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2045.codfw.wmnet with reason: host reimage
* 15:28 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
* 15:26 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:886997{{!}}Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456)]] (duration: 10m 39s)
* 15:25 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: host reimage
* 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43760 and previous config saved to /var/cache/conftool/dbconfig/20230207-152337-root.json
* 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 15:17 urbanecm@deploy1002: matmarex and urbanecm: Backport for [[gerrit:886997{{!}}Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 15:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 15:15 urbanecm@deploy1002: Started scap: Backport for [[gerrit:886997{{!}}Add "Page Frame" to DiscussionTools beta feature on enwiki (T327456)]]
* 15:14 volans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: [[phab:T327925|T327925]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
* 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1043.eqiad.wmnet with OS bullseye
* 15:13 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2045.codfw.wmnet with OS bullseye
* 15:12 vgutierrez: repool codfw edge site - [[phab:T327925|T327925]]
* 15:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
* 15:09 volans@cumin2002: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
* 15:09 volans@cumin2002: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: [[phab:T327925|T327925]]
* 15:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
* 15:07 volans@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in codfw: [[phab:T327925|T327925]]
* 15:05 marostegui: dbmaint deploy schema change on s8 [[phab:T328807|T328807]] [[phab:T328828|T328828]]
* 15:04 vgutierrez: restart pybal in lvs2010 - [[phab:T327925|T327925]]
* 15:01 marostegui: dbmaint deploy schema change on s6 [[phab:T328807|T328807]]
* 15:00 vgutierrez: restart pybal in lvs2009 - [[phab:T327925|T327925]]
* 14:59 marostegui: dbmaint deploy schema change on s6 [[phab:T328828|T328828]]
* 14:53 moritzm: adding nfraison to pwstore [[phab:T328915|T328915]]
* 14:46 volans@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in codfw: [[phab:T327925|T327925]]
* 14:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
* 14:40 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
* 14:36 claime: repooled appserver, api_appserver, jobrunner, parsoid - [[phab:T327925|T327925]]
* 14:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver
* 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=jobrunner
* 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver
* 14:35 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
* 14:32 Emperor: pool ms-fe2009 (codfw as a whole still depooled) [[phab:T327925|T327925]]
* 14:28 jbond: enable puppet in codfw, uslfo, esams post switch upgrade [[phab:T327925|T327925]]
* 14:26 claime: depooled appserver, api_appserver, jobrunner, parsoid - [[phab:T327925|T327925]]
* 14:25 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
* 14:21 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=parsoid
* 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=appserver
* 14:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=jobrunner
* 14:18 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=api_appserver
* 14:13 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
* 14:13 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2001.codfw.wmnet,service=thanos-web
* 14:08 jbond: disable puppet in codfw, uslfo, esams for switch upgrade [[phab:T327925|T327925]]
* 14:07 lucaswerkmeister-wmde@deploy1002: backport aborted:  (duration: 17m 46s)
* 14:06 XioNoX: asw-a-codfw> request system reboot all-members  - [[phab:T327925|T327925]]
* 13:59 XioNoX: disable puppet in ulsfo/esams/codfw for codfw row A switch upgrade - [[phab:T327925|T327925]]
* 13:56 Emperor: depool ms-fe2009 [[phab:T327925|T327925]]
* 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
* 13:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mw2422 and 24 DNS - pt1979@cumin2002"
* 13:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 199 hosts with reason: codfw row A upgrade
* 13:32 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) depool all active/active services in codfw: [[phab:T327925|T327925]]
* 13:31 vgutierrez: depool codfw edge site - [[phab:T327925|T327925]]
* 13:31 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 199 hosts with reason: codfw row A upgrade
* 13:13 jbond: enable puppet in codfw, ulsfo and esams to allow depools post  switch upgrade [[phab:T327925|T327925]]
* 13:11 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route depool all active/active services in codfw: [[phab:T327925|T327925]]
* 13:05 jbond: diable puppet in codfw, ulsfo and esams for switch upgrade [[phab:T327925|T327925]]
* 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm6001.drmrs.wmnet
* 12:28 vgutierrez: depooling authdns2001 - [[phab:T327925|T327925]]
* 12:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on doh2001.wikimedia.org with reason: depooled; [[phab:T327925|T327925]]
* 12:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on doh2001.wikimedia.org with reason: depooled; [[phab:T327925|T327925]]
* 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm6001.drmrs.wmnet on all recursors
* 12:20 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache testvm6001.drmrs.wmnet on all recursors
* 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
* 12:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm6001.drmrs.wmnet - jmm@cumin2002"
* 12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm6001.drmrs.wmnet
* 12:00 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1041.eqiad.wmnet with OS bullseye
* 11:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2044.codfw.wmnet with OS bullseye
* 11:56 marostegui: Install 10.4.28 on db1152 [[phab:T329011|T329011]]
* 11:52 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
* 11:44 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
* 11:41 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: host reimage
* 11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
* 11:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2044.codfw.wmnet with reason: host reimage
* 11:33 moritzm: installing imagemagick security updates on buster
* 11:29 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1041.eqiad.wmnet with OS bullseye
* 11:21 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2044.codfw.wmnet with OS bullseye
* 10:51 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
* 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
* 10:19 oblivian@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter-route (exit_code=0) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
* 10:19 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
* 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bullseye
* 10:13 oblivian@cumin2002: END (FAIL) - Cookbook sre.discovery.datacenter-route (exit_code=93) pool all active/active services in eqiad: Pooling eqiad for codfw depool today
* 10:12 oblivian@cumin2002: START - Cookbook sre.discovery.datacenter-route pool all active/active services in eqiad: Pooling eqiad for codfw depool today
* 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
* 09:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
* 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bullseye
* 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2002.wikimedia.org with OS bullseye
* 09:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 09:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 09:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
* 09:20 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 09:20 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 09:20 akosiaris: add wiktionary to mobile-sections rerenders. [[phab:T226931|T226931]]
* 09:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2002.wikimedia.org with reason: host reimage
* 09:19 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 09:19 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 09:08 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
* 09:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast2002.wikimedia.org with OS bullseye
* 08:50 vgutierrez: rolling upgrade to HAProxy 2.4.21 in cp nodes
* 08:48 kostajh: UTC morning deploys done
* 08:48 kharlan@deploy1002: Finished scap: Backport for [[gerrit:883236{{!}}[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153{{!}}Remove GEMentorProvider (T321501)]] (duration: 12m 48s)
* 08:37 kharlan@deploy1002: urbanecm and kharlan: Backport for [[gerrit:883236{{!}}[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153{{!}}Remove GEMentorProvider (T321501)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:35 kharlan@deploy1002: Started scap: Backport for [[gerrit:883236{{!}}[Growth] Remove mentor list variables (T321501)]], [[gerrit:883153{{!}}Remove GEMentorProvider (T321501)]]
* 08:30 moritzm: installing imagemagick security updates on Thumbor [[phab:T328901|T328901]]
* 08:28 kharlan@deploy1002: Finished scap: Backport for [[gerrit:886343{{!}}GrowthExperiments: Disable leveling up features in production (T328757)]] (duration: 12m 11s)
* 08:18 kharlan@deploy1002: kharlan: Backport for [[gerrit:886343{{!}}GrowthExperiments: Disable leveling up features in production (T328757)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 08:16 kharlan@deploy1002: Started scap: Backport for [[gerrit:886343{{!}}GrowthExperiments: Disable leveling up features in production (T328757)]]
* 08:14 kharlan@deploy1002: backport aborted:  (duration: 00m 07s)
* 07:00 marostegui: Failover m3 from db1159 to db1164 - [[phab:T328404|T328404]]
* 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2110 in API', diff saved to https://phabricator.wikimedia.org/P43758 and previous config saved to /var/cache/conftool/dbconfig/20230207-063147-root.json
* 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P43757 and previous config saved to /var/cache/conftool/dbconfig/20230207-062826-root.json
* 04:58 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.20 (duration: 02m 20s)
* 04:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.22  refs [[phab:T325585|T325585]] (duration: 53m 11s)
* 04:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.22 refs [[phab:T325585|T325585]]


== 2021-08-25 ==
== 2023-02-06 ==
* 23:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
* 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw2421.mgmt.codfw.wmnet with reboot policy FORCED
* 23:20 urbanecm: Evening B&C window completed
* 22:55 ryankemper: [[phab:T327925|T327925]] Depooled codfw wdqs hosts: `ryankemper@cumin2002:~$ sudo -E cumin -b 3 'wdqs[2003-2004,2009]*' 'sudo depool'`
* 23:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.20/extensions/GlobalWatchlist/modules/EntryLog.js: {{Gerrit|230aec3fe7f3d0e325882a5fc926e9f3e4e86717}}: GlobalWatchlistEntryLog: fix storing log id ([[phab:T288385|T288385]]) (duration: 01m 07s)
* 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
* 22:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: switch upgrade
* 22:18
* 22:48 ryankemper: [[phab:T327925|T327925]] Banned `elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]` on codfw elastic
* 22


== 2021-08-24 ==
== 2023-02-05 ==
* 22:05 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 22:28 topranks: Re-enabling peering to Seabone/Telecom Italit AS 6762 on cr2-esams at AMS-IX
* 22:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 14:39 cdanis: silenced NELHigh alert for 20 hours: Telecom Italy issues; alertmanager silence id 3fb3b999-9756-44af-a1e8-{{Gerrit|fd1faae8b9bf}}
* 21:10 tgr: running extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php on various wikis per [[phab:T282873|T282873]]#7303828
* 11:49 topranks: Manually deactivating peering to Telecom Italia / Seabone at AMS-IX on cr2-esams as they are having issues
* 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:55 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a6fd96b15e6e3c068c2faac60208b9722d32af0f}}: Growth features: Promote 9 wikis out of dark mode ([[phab:T287871|T287871]]; [[phab:T287874|T287874]]; [[phab:T287872|T287872]]; [[phab:T287880|T287880]]; [[phab:T287868|T287868]]; [[phab:T287873|T287873]]; [[phab:T287879|T287879]]; [[phab:T287875|T287875]]; [[phab:T287876|T287876]]) (duration: 01m 25s)
* 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:35 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.17 (duration: 01m 48s)
* 20:33 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.18 (duration: 03m 26s)
* 20:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.20
* 20:18 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.20 (duration: 36m 32s)
* 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:41 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.20
* 17:23 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:19 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:17 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 15:26 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics (duration: 02m 17s)
* 15:23 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@e02c602]: transfer_to_es: stop adding data to article_topics
* 15:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:55 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 14:54 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 14:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 14:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2031.codfw.wmnet with reason: REIMAGE
* 13:12 XioNoX: push pfw policies - [[phab:T289353|T289353]]
* 12:45 vgutierrez: enable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 12:37 vgutierrez: disable puppet on P:tlsproxy::envoy hosts - merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/710507/9
* 12:33 godog: test patched python3-eventlet on thanos-fe1003 - [[phab:T283714|T283714]]
* 12:30 marostegui: Install 10.4.21 on clouddb1015
* 11:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 11:24 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2029.codfw.wmnet with reason: REIMAGE
* 09:08 jbond: upload new statograph version
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 09:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:54 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=dewiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 08:51 Amir1: start of mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=arwiki --prune --batch-size=5 --sleep=5 ([[phab:T289249|T289249]])
* 08:01 godog: temp fix thanos-swift.discovery.wmnet in /etc/hosts to get swift-dispersion-stats to work - [[phab:T283714|T283714]]
* 07:51 dcausse: repool wdqs1012 [[phab:T289551|T289551]]
* 07:29 dcausse: restarting blazegraph on wdqs1012
* 07:17 marostegui: Optimize huwiki.flaggedtemplates on db1127
* 07:15 marostegui: Optimize huwiki.flaggedtemplates on db1098:3317
* 06:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 06:14 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2037.codfw.wmnet with reason: REIMAGE
* 03:51 rzl: rzl@wdqs1012:~$ sudo depool
* 03:46 legoktm: wdqs1012 restarted prometheus-blazegraph-exporter-wdqs-blazegraph.service and prometheus-blazegraph-exporter-wdqs-categories.service after apparent exceptions/crashes
* 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:17 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 00:17 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 00:16 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@da9efa9]: 0.3.83 (duration: 07m 05s)
* 00:10 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.83` on canary `wdqs1003`; proceeding to rest of fleet
* 00:09 ryankemper@deploy1002: Started deploy [wdqs/wdqs@da9efa9]: 0.3.83
* 00:08 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.83`. Pre-deploy tests passing on canary `wdqs1003`


== 2021-08-23 ==
== 2023-02-03 ==
* 23:41 ryankemper: [[phab:T285355|T285355]] `helmfile -e staging -i apply` on `/srv/deployment-charts/helmfile.d/services/linkrecommendation/` from `ryankemper@deploy1002`
* 21:05 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:40 ryankemper@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 21:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:56 tgr: morning deploys done
* 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:56 tgr@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/GrowthExperiments: Backport: [[gerrit:714158{{!}}Add Link: store when tasks were generated (T284551)]] (duration: 00m 57s)
* 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
* 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:00 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 18:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:52 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 18:27 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713907{{!}}wmfSetupEtcd only supports array input]] (duration: 00m 57s)
* 19:44 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1090.eqiad.wmnet
* 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1090.eqiad.wmnet with OS bullseye
* 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:00 dzahn@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
* 18:23 dancy@deploy1002: Synchronized wmf-config: Config: [[gerrit:713906{{!}}Use array format to specify etcd server]] (duration: 00m 57s)
* 18:59 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test what is not synced - dzahn@cumin2002"
* 18:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
* 18:12 dancy@deploy1002: Synchronized wmf-config/etcd.php: Config: [[gerrit:713704{{!}}Allow protocol for etcd server to be specified]] (duration: 00m 57s)
* 18:49 topranks: Enabling 4x10G channelization for pic 0 QSFP 4 on cr1-codfw
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1090.eqiad.wmnet with reason: host reimage
* 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1090.eqiad.wmnet with OS bullseye
* 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:23 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1088.eqiad.wmnet
* 17:17 ebernhardson@deploy1002: Finished deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow) (duration: 00m 56s)
* 18:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1088.eqiad.wmnet with OS bullseye
* 17:16 ebernhardson@deploy1002: Started deploy [search/airflow@4c49df7]: ship modern pip/wheel version to support manylinux2014 (pyarrow)
* 17:57 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
* 16:37 ebernhardson@deploy1002: Finished deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration (duration: 00m 35s)
* 17:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
* 16:37 ebernhardson@deploy1002: Started deploy [search/airflow@32f5039]: Add pyarrow lib for hdfs integration
* 17:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1089.eqiad.wmnet
* 16:24 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 17:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1089.eqiad.wmnet with OS bullseye
* 16:21 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2027.codfw.wmnet with reason: REIMAGE
* 17:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1088.eqiad.wmnet with OS bullseye
* 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 17:34 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1086.eqiad.wmnet
* 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS bullseye
* 15:43 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
* 17:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
* 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 17:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 14:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1089.eqiad.wmnet with reason: host reimage
* 14:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 3/3) (duration: 00m 56s)
* 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
* 14:58 urbanecm@deploy1002: Synchronized wmf-config/config/ckbwiki.yaml: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 2/3) (duration: 00m 57s)
* 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS bullseye
* 14:57 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|26fe6d7a380d4a798f78abf0e722e36c5c63df80}}: ckbwiki: Enable Growth features in dark mode ([[phab:T287867|T287867]]; 1/3) (duration: 00m 57s)
* 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1089.eqiad.wmnet with OS bullseye
* 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=ckbwiki --phab=[[phab:T287867|T287867]] # [[phab:T287867|T287867]]
* 16:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
* 14:53 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki-staging/php]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=ckbwiki growthexperiments # [[phab:T287867|T287867]]
* 16:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for cloudsw1-b1-codfw mgmt IP. - cmooney@cumin1001"
* 14:29 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
* 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2012.codfw.wmnet
* 14:00 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons. - btullis@cumin1001
* 16:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2012.codfw.wmnet
* 13:57 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:51 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): test (duration: 00m 26s)
* 13:56 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:51 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): test
* 13:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:23 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection (duration: 00m 15s)
* 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 15:22 milimetric@deploy1002: Started deploy [airflow-dags/analytics@ec3e0de]: Hotfix disabling skein log collection
* 12:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 09s)
* 12:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
* 12:55 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 00m 57s)
* 14:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2011.codfw.wmnet
* 11:35 Lucas_WMDE: EU backport+config window done
* 14:19 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 23s)
* 11:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (2/2) (duration: 00m 57s)
* 14:18 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714334{{!}}Set $wgWBRepoSettings['tmpNormalizeDataValues'] on test wikis (T251480)]] (1/2) (duration: 00m 58s)
* 14:13 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2011.codfw.wmnet
* 11:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=ats-be
* 11:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:55 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1087.eqiad.wmnet,service=cdn
* 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS bullseye
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 11:04 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:713860{{!}}Revert "Enable NewUserMessage on hiwiktionary" (T287091)]] (duration: 00m 57s)
* 13:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
* 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 13:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS bullseye
* 10:55 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2025.codfw.wmnet with reason: REIMAGE
* 12:09 moritzm: installing node-moment security updates
* 09:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714152{{!}}Add extra sleep option between each batch in pruneRevData.php (T289249)]] (duration: 00m 58s)
* 12:01 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided) (duration: 00m 13s)
* 09:55 mbsantos: start re-import OSM planet data into maps1009 eqiad master ([[phab:T288400|T288400]], [[phab:T288897|T288897]])
* 12:00 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@598ff3c] (releasing): (no justification provided)
* 09:53 urbanecm: Deploy security patch for [[phab:T289408|T289408]]
* 11:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2010.codfw.wmnet
* 09:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:58 moritzm: installing node-qs security updates
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2010.codfw.wmnet
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift-ro,name=codfw
* 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2009.codfw.wmnet
* 09:33 filippo@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
* 11:28 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2009.codfw.wmnet
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro,name=eqiad
* 10:44 moritzm: updating perf on buster hosts
* 09:02 filippo@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=swift,name=eqiad
* 10:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 09:01 godog: pooling swift in eqiad - [[phab:T288458|T288458]]
* 10:11 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:59 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:09 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2008.codfw.wmnet
* 07:55 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:07 stevemunene@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:06 stevemunene@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
* 07:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:714322{{!}}Set request languages rdf output for wikidata to true (T285795)]] (duration: 00m 57s)
* 10:03 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2008.codfw.wmnet
* 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:51 moritzm: installing ruby-rack security updates
* 07:28 Amir1: running FlaggedRevs/maintenance/pruneRevData.php on all flaggedrevs wikis
* 09:31 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 07:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: [[gerrit:714151{{!}}Avoid calling delete() with empty arrays in PruneFRIncludeData (T289249)]] (duration: 00m 59s)
* 09:31 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 07:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 09:24 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 07:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2023.codfw.wmnet with reason: REIMAGE
* 09:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:23 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:23 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:19 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1001.eqiad.wmnet
* 09:14 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1001.eqiad.wmnet
* 09:13 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:13 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 09:07 moritzm: installing modsecurity-crs security updates
* 09:02 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 09:02 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
* 05:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1084.eqiad.wmnet
* 05:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS bullseye
* 05:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1085.eqiad.wmnet with OS bullseye
* 04:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
* 04:47 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
* 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
* 04:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
* 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS bullseye
* 04:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS bullseye
* 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1083.eqiad.wmnet
* 04:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1082.eqiad.wmnet
* 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS bullseye
* 04:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS bullseye
* 03:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 03:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
* 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
* 03:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
* 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS bullseye
* 03:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS bullseye
* 03:20 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1080.eqiad.wmnet
* 03:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS bullseye
* 02:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 02:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
* 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=ats-be
* 02:28 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1081.eqiad.wmnet,service=cdn
* 02:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS bullseye
* 02:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
* 02:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 02:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
* 01:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS bullseye
* 01:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
* 00:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye


== 2021-08-21 ==
== 2023-02-02 ==
* 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:58 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1080.eqiad.wmnet with OS bullseye
* 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:15 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1079.eqiad.wmnet
* 22:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS bullseye
* 22:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS bullseye
* 22:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1078.eqiad.wmnet
* 21:58 zabe@deploy1002: Finished scap: Backport for [[gerrit:886149{{!}}Stop writing to cuc_comment everywhere (T233004)]] (duration: 07m 58s)
* 21:52 zabe@deploy1002: zabe: Backport for [[gerrit:886149{{!}}Stop writing to cuc_comment everywhere (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 21:50 zabe@deploy1002: Started scap: Backport for [[gerrit:886149{{!}}Stop writing to cuc_comment everywhere (T233004)]]
* 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS bullseye
* 21:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 21:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
* 21:30 brennen: end of utc late backport & config window
* 21:30 brennen@deploy1002: Finished scap: Backport for [[gerrit:886118{{!}}Enable client preferences everywhere (T327979)]] (duration: 11m 14s)
* 21:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
* 21:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS bullseye
* 21:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp1077.eqiad.wmnet
* 21:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS bullseye
* 21:21 brennen@deploy1002: brennen and nray: Backport for [[gerrit:886118{{!}}Enable client preferences everywhere (T327979)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 21:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
* 21:19 brennen@deploy1002: Started scap: Backport for [[gerrit:886118{{!}}Enable client preferences everywhere (T327979)]]
* 21:18 brennen@deploy1002: Finished scap: Backport for [[gerrit:885359{{!}}Disable write old for CheckUserLog reason everywhere (T233004)]] (duration: 12m 02s)
* 21:07 brennen@deploy1002: brennen and dreamyjazz: Backport for [[gerrit:885359{{!}}Disable write old for CheckUserLog reason everywhere (T233004)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 21:06 brennen@deploy1002: Started scap: Backport for [[gerrit:885359{{!}}Disable write old for CheckUserLog reason everywhere (T233004)]]
* 20:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
* 20:59 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1078.eqiad.wmnet with OS bullseye
* 20:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 20:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
* 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS bullseye
* 20:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS bullseye
* 20:23 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.3-1+deb11u1_amd64.changes  # [[phab:T328280|T328280]]
* 20:21 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.3-1_amd64.changes  # [[phab:T328280|T328280]]
* 20:11 zabe@deploy1002: Finished scap: Backport for [[gerrit:886135{{!}}Stop writing to cuc_user and cuc_user_text everywhere (T233004)]] (duration: 09m 39s)
* 20:03 zabe@deploy1002: zabe: Backport for [[gerrit:886135{{!}}Stop writing to cuc_user and cuc_user_text everywhere (T233004)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 20:02 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2037.codfw.wmnet
* 20:01 zabe@deploy1002: Started scap: Backport for [[gerrit:886135{{!}}Stop writing to cuc_user and cuc_user_text everywhere (T233004)]]
* 19:55 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
* 19:54 ryankemper: [[phab:T328674|T328674]] [Elastic] With puppet disabled on elastic* fleet, `ryankemper@elastic2037:~$ sudo run-puppet-agent --force` to verify changes in https://gerrit.wikimedia.org/r/886055
* 19:30 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.21  refs [[phab:T325584|T325584]]
* 19:28 zabe@deploy1002: say aborted:  (duration: 00m 03s)
* 18:42 zabe@deploy1002: Finished scap: Backport for [[gerrit:886127{{!}}Stop writing to cuc_comment in group1 wikis (T233004)]] (duration: 08m 19s)
* 18:36 zabe@deploy1002: zabe: Backport for [[gerrit:886127{{!}}Stop writing to cuc_comment in group1 wikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 18:34 zabe@deploy1002: Started scap: Backport for [[gerrit:886127{{!}}Stop writing to cuc_comment in group1 wikis (T233004)]]
* 18:08 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
* 18:08 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
* 18:08 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
* 18:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2043.codfw.wmnet with OS bullseye
* 18:07 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
* 18:06 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
* 18:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
* 18:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
* 18:03 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bullseye
* 17:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
* 17:49 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2043.codfw.wmnet with reason: host reimage
* 17:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
* 17:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
* 17:33 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc2043.codfw.wmnet with OS bullseye
* 17:32 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bullseye
* 17:29 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Production (gitlab1004) to 15.7.6-ce.0
* 17:12 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
* 17:12 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
* 16:53 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 16:52 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 16:51 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 16:50 dancy@deploy1002: Installation of scap version "4.34.0" completed for 561 hosts
* 16:50 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 16:50 dancy@deploy1002: Installing scap version "4.34.0" for 561 hosts
* 16:50 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 16:49 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 16:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 16:48 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 16:47 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
* 16:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
* 16:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2007.codfw.wmnet
* 16:18 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
* 16:17 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
* 16:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2007.codfw.wmnet
* 16:17 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
* 16:16 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
* 16:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 16:15 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 16:10 volans: uploaded python3-wmflib_1.2.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
* 15:40 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided) (duration: 07m 01s)
* 15:38 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 15:37 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 15:35 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 15:35 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 15:34 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica gitlab2002 to 15.7.6-ce.0
* 15:33 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@e38efa6] (releasing): (no justification provided)
* 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti3004
* 15:17 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti3004
* 15:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2006.codfw.wmnet
* 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
* 15:02 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3004 was renamed as ganeti4004 - jmm@cumin2002"
* 15:00 vgutierrez: rolling restart of varnish in cache::text - [[phab:T315676|T315676]]
* 14:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 14:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2006.codfw.wmnet
* 14:55 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 14:45 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 14:39 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 14:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2005.codfw.wmnet
* 14:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 14:25 moritzm: installing containerd security updates on codfw k8s nodes
* 14:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2005.codfw.wmnet
* 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
* 13:34 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
* 13:10 kharlan:: Deployed security patch for [[phab:T328643|T328643]]
* 13:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS bullseye
* 13:04 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:03 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
* 13:03 kharlan:: Deployed security patch for [[phab:T328643|T328643]]
* 13:02 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 13:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2004.codfw.wmnet
* 13:00 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
* 12:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2004.codfw.wmnet
* 12:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
* 12:47 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
* 12:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
* 12:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:42 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
* 12:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
* 12:29 btullis@deploy1002: Finished deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade (duration: 00m 42s)
* 12:29 claime: Work ongoing on m2 and m3
* 12:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2003.codfw.wmnet
* 12:29 btullis@deploy1002: Started deploy [analytics/superset/deploy@5175ad7]: Production deployment for numpy downgrade
* 12:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS bullseye
* 12:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2003.codfw.wmnet
* 12:08 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 12:08 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
* 11:46 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 11:41 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
* 11:41 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
* 11:40 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
* 11:39 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
* 11:38 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
* 11:37 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
* 11:37 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix {{!}} tee [[phab:T328634|T328634]]-namespaceDupes-4.out # [[phab:T328634|T328634]] – made some progress then errored out again
* 11:32 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=[[phab:T328634|T328634]]/ {{!}} tee [[phab:T328634|T328634]]-namespaceDupes-3.out # [[phab:T328634|T328634]] – seemed to finish the first 20 pages and then go into an infinite loop, I Ctrl+Ced it
* 11:28 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix --add-prefix=[[phab:T328634|T328634]]/ {{!}} tee [[phab:T328634|T328634]]-namespaceDupes-2.out # [[phab:T328634|T328634]] – another error but made more progress
* 11:23 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes.php shnwikibooks --fix {{!}} tee [[phab:T328634|T328634]]-namespaceDupes.out # [[phab:T328634|T328634]] – failed quickly, details in task
* 11:22 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
* 11:22 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
* 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
* 11:02 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
* 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2002.codfw.wmnet
* 10:19 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2002.codfw.wmnet
* 10:17 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
* 10:11 moritzm: restarting FPM on mw canaries to pick up tiff security updates
* 10:04 moritzm: installing tiff security updates
* 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aqs2001.codfw.wmnet
* 09:55 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
* 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
* 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host aqs2001.codfw.wmnet
* 09:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
* 09:40 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
* 09:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 398143
* 09:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 398143
* 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
* 09:13 apergos: UTC morning backport and config training window done
* 09:13 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
* 09:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
* 09:11 elukey: roll restart of eventgate-main pods in wikikube eqiad/codfw to pick up new stream configs - [[phab:T328576|T328576]]
* 08:57 ariel@deploy1002: Finished scap: Backport for [[gerrit:885927{{!}}Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)]] (duration: 10m 56s)
* 08:48 ariel@deploy1002: ariel and aishik: Backport for [[gerrit:885927{{!}}Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:46 ariel@deploy1002: Started scap: Backport for [[gerrit:885927{{!}}Enable wgMinervaEnableSiteNotice for bnwiktionary (T328630)]]
* 08:39 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica gitlab1004 to 15.7.6
* 08:37 tgr@deploy1002: Finished scap: Backport for [[gerrit:885928{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]], [[gerrit:885929{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]] (duration: 14m 26s)
* 08:27 tgr@deploy1002: tgr: Backport for [[gerrit:885928{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]], [[gerrit:885929{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:23 tgr@deploy1002: Started scap: Backport for [[gerrit:885928{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]], [[gerrit:885929{{!}}campaigns: Donor landing page translations for sv, it, ja, fr, nl (T321370)]]
* 06:17 kart_: Updated cxserver to 2023-02-02-004918-production ([[phab:T129470|T129470]], [[phab:T172035|T172035]], [[phab:T327842|T327842]])
* 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
* 06:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
* 06:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
* 06:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
* 06:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
* 06:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
* 04:00 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
* 03:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
* 03:21 ejegg: payments-wiki upgraded from {{Gerrit|f20a2208}} to {{Gerrit|53d1a58d}}
* 02:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
* 02:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
* 02:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
* 02:14 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
* 01:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
* 01:55 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
* 01:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
* 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 01:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
* 01:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS bullseye
* 01:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 01:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
* 01:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
* 01:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
* 01:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS bullseye
* 00:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
* 00:06 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
* 00:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye


== 2021-08-20 ==
== 2023-02-01 ==
* 23:17 legoktm: deployed patch for [[phab:T289385|T289385]]
* 23:45 zabe@deploy1002: Finished scap: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]] (duration: 08m 07s)
* 17:03 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1141.eqiad.wmnet
* 23:39 zabe@deploy1002: zabe: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 17:01 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1141.eqiad.wmnet
* 23:37 zabe@deploy1002: Started scap: Backport for [[gerrit:885908{{!}}Stop writing to cuc_user and cuc_user_text in group1 wikis (T233004)]]
* 16:58 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1140.eqiad.wmnet
* 23:31 rzl@cumin2002: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P43574 and previous config saved to /var/cache/conftool/dbconfig/20230201-233140-rzl.json
* 16:56 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1140.eqiad.wmnet
* 23:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
* 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1139.eqiad.wmnet
* 23:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
* 16:54 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1139.eqiad.wmnet
* 23:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: security release
* 16:45 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1134.eqiad.wmnet
* 23:17 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.21  refs [[phab:T325584|T325584]] (duration: 06m 57s)
* 16:43 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1134.eqiad.wmnet
* 23:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21  refs [[phab:T325584|T325584]]
* 16:38 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1133.eqiad.wmnet
* 23:01 zabe@deploy1002: Finished scap: Backport for [[gerrit:885781{{!}}CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)]] (duration: 07m 45s)
* 16:36 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1133.eqiad.wmnet
* 22:55 zabe@deploy1002: zabe: Backport for [[gerrit:885781{{!}}CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
* 15:37 jayme: deleting various pods from staging to have them recreated with priorities - [[phab:T289131|T289131]]
* 22:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
* 15:25 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1129.eqiad.wmnet
* 22:53 zabe@deploy1002: Started scap: Backport for [[gerrit:885781{{!}}CachingKartographerEmbeddingHandler: Fall back to Special:BlankPage title (T328601)]]
* 15:23 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1129.eqiad.wmnet
* 22:49 zabe@deploy1002: Finished scap: Backport for [[gerrit:885898{{!}}Stop writing to cuc_comment_id in group0 wikis (T233004)]] (duration: 13m 03s)
* 15:14 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:47 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
* 14:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 22:40 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5022.eqsin.wmnet with OS bullseye
* 14:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2021.codfw.wmnet with reason: REIMAGE
* 22:38 zabe@deploy1002: zabe: Backport for [[gerrit:885898{{!}}Stop writing to cuc_comment_id in group0 wikis (T233004)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 13:54 jelto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:36 zabe@deploy1002: Started scap: Backport for [[gerrit:885898{{!}}Stop writing to cuc_comment_id in group0 wikis (T233004)]]
* 13:48 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:32 kindrobot: close UTC late backport window
* 12:00 jayme: enabled priority admission plugin on k8s staging, rolling restart all pods in kube-system namespace - [[phab:T289131|T289131]]
* 22:31 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:885841{{!}}Enable client preferences for group1 (T327979)]] (duration: 10m 37s)
* 11:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 22:22 kindrobot@deploy1002: nray and kindrobot: Backport for [[gerrit:885841{{!}}Enable client preferences for group1 (T327979)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 10:35 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 22:21 kindrobot@deploy1002: Started scap: Backport for [[gerrit:885841{{!}}Enable client preferences for group1 (T327979)]]
* 09:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1001.eqiad.wmnet
* 22:14 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:885852{{!}}Enable Linter write namespace, tag and template for all wikis (T299612)]] (duration: 18m 14s)
* 09:32 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 21:57 kindrobot@deploy1002: kindrobot and sbailey: Backport for [[gerrit:885852{{!}}Enable Linter write namespace, tag and template for all wikis (T299612)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 09:23 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1001.eqiad.wmnet
* 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100*: Applying new TLS certificates — [[phab:T327675|T327675]] - eevans@cumin1001
* 08:48 godog: roll depool/pool thanos-fe to apply swift change - [[phab:T288815|T288815]]
* 21:56 kindrobot@deploy1002: Started scap: Backport for [[gerrit:885852{{!}}Enable Linter write namespace, tag and template for all wikis (T299612)]]
* 08:43 godog: temp depool thanos-fe2003 to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/713815
* 21:53 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 08:43 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 21:52 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:885358{{!}}Disable write old for CheckUserLog reason on group 0 (T233004)]] (duration: 14m 53s)
* 08:43 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on druid1001.eqiad.wmnet with reason: decommissioning druid1001
* 21:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
* 07:14 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 21:39 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100*: Applying new TLS certificates — [[phab:T327675|T327675]] - eevans@cumin1001
* 07:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 21:39 kindrobot@deploy1002: dreamyjazz and kindrobot: Backport for [[gerrit:885358{{!}}Disable write old for CheckUserLog reason on group 0 (T233004)]] synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 07:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 21:37 kindrobot@deploy1002: Started scap: Backport for [[gerrit:885358{{!}}Disable write old for CheckUserLog reason on group 0 (T233004)]]
* 07:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: REIMAGE
* 21:32 kindrobot@deploy1002: Finished scap: Backport for [[gerrit:865214{{!}}Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)]] (duration: 13m 56s)
* 07:09 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2019.codfw.wmnet with reason: REIMAGE
* 21:26 eevans@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 07:08 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: REIMAGE
* 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 06:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 21:26 eevans@puppetmaster1001: conftool action : get/pooled=true; selector: dnsdisc=sessionstore,name=codfw
* 06:07 TimStarling: sending election email to 44k people
* 21:24 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
* 03:15 legoktm@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Score/scripts/removeTagline.php: removeTagline: Set explicit pcre.backtrack_limit ([[phab:T289298|T289298]]) (duration: 00m 58s)
* 21:20 kindrobot@deploy1002: arlolra and kindrobot: Backport for [[gerrit:865214{{!}}Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:19 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200*: Applying new TLS certificates — [[phab:T327675|T327675]] - eevans@cumin1001
* 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:18 kindrobot@deploy1002: Started scap: Backport for [[gerrit:865214{{!}}Disable wgParserEnableLegacyMediaDOM on group1 wikis (T314318)]]
* 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:14 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
* 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3065.esams.wmnet with OS bullseye
* 00:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/cli/wm-scripts/makeMailingList.php: code that uses said hack (duration: 00m 57s)
* 21:03 kindrobot: start UTC late backport deployment window
* 00:12 tstarling@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/SecurePoll/includes/User/LocalAuth.php: hack for mailout (duration: 00m 58s)
* 21:02 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200*: Applying new TLS certificates — [[phab:T327675|T327675]] - eevans@cumin1001
* 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
* 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:44 eevans@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
* 20:43 urandom: depooling sessionstore —codfw— in preparation for Cassandra restarts — [[phab:T327675|T327675]]
* 20:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3065.esams.wmnet with reason: host reimage
* 20:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3064.esams.wmnet
* 20:38 eevans@puppetmaster1001: conftool action : get/pooled; selector: dnsdisc=$SERVICE,name=$DC
* 20:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3064.esams.wmnet with OS bullseye
* 20:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3065.esams.wmnet with OS bullseye
* 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3063.esams.wmnet
* 20:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
* 20:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS bullseye
* 20:08 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3064.esams.wmnet with reason: host reimage
* 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=ats-be
* 20:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5031.eqsin.wmnet,service=cdn
* 20:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5031.eqsin.wmnet with OS bullseye
* 19:53 dancy: The train is blocked on [[phab:T328601|T328601]]
* 19:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3064.esams.wmnet with OS bullseye
* 19:49 dancy@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325584|T325584]] (duration: 06m 36s)
* 19:49 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3062.esams.wmnet
* 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS bullseye
* 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
* 19:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
* 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.20  refs [[phab:T325584|T325584]]
* 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=ats-be
* 19:41 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet,service=cdn
* 19:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
* 19:33 dancy@deploy1002: deploy-promote aborted:  (duration: 11m 58s)
* 19:33 dancy@deploy1002: sync-file aborted: group1 wikis to 1.40.0-wmf.21  refs [[phab:T325584|T325584]] (duration: 03m 38s)
* 19:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
* 19:29 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.21  refs [[phab:T325584|T325584]]
* 19:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
* 19:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 19:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS bullseye
* 19:24 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
* 19:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
* 19:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS bullseye
* 19:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
* 19:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS bullseye
* 19:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3060.esams.wmnet
* 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS bullseye
* 19:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
* 18:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
* 18:55 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
* 18:55 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
* 18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
* 18:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
* 18:46 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS bullseye
* 18:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts puppetmaster2003.codfw.wmnet
* 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
* 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS bullseye
* 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
* 18:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS bullseye
* 18:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3059.esams.wmnet
* 18:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS bullseye
* 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
* 18:29 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2003.codfw.wmnet
* 18:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
* 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
* 18:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
* 18:20 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp1075.eqiad.wmnet with reason: downtimed for idrac firmware testing
* 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=ats-be
* 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5030.eqsin.wmnet,service=cdn
* 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=ats-be
* 18:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet,service=cdn
* 18:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS bullseye
* 18:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3058.esams.wmnet
* 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS bullseye
* 18:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5030.eqsin.wmnet with OS bullseye
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43573 and previous config saved to /var/cache/conftool/dbconfig/20230201-181036-root.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43572 and previous config saved to /var/cache/conftool/dbconfig/20230201-181031-root.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43571 and previous config saved to /var/cache/conftool/dbconfig/20230201-181024-root.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43570 and previous config saved to /var/cache/conftool/dbconfig/20230201-181016-root.json
* 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43569 and previous config saved to /var/cache/conftool/dbconfig/20230201-181011-root.json
* 18:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 18:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43568 and previous config saved to /var/cache/conftool/dbconfig/20230201-175531-root.json
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43567 and previous config saved to /var/cache/conftool/dbconfig/20230201-175526-root.json
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43566 and previous config saved to /var/cache/conftool/dbconfig/20230201-175519-root.json
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43565 and previous config saved to /var/cache/conftool/dbconfig/20230201-175511-root.json
* 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43564 and previous config saved to /var/cache/conftool/dbconfig/20230201-175506-root.json
* 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P43563 and previous config saved to /var/cache/conftool/dbconfig/20230201-175446-root.json
* 17:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
* 17:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
* 17:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS bullseye
* 17:40 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3057.esams.wmnet
* 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS bullseye
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43562 and previous config saved to /var/cache/conftool/dbconfig/20230201-174026-root.json
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43561 and previous config saved to /var/cache/conftool/dbconfig/20230201-174021-root.json
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43560 and previous config saved to /var/cache/conftool/dbconfig/20230201-174015-root.json
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43559 and previous config saved to /var/cache/conftool/dbconfig/20230201-174007-root.json
* 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43558 and previous config saved to /var/cache/conftool/dbconfig/20230201-174001-root.json
* 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P43557 and previous config saved to /var/cache/conftool/dbconfig/20230201-173941-root.json
* 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
* 17:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43555 and previous config saved to /var/cache/conftool/dbconfig/20230201-172521-root.json
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43554 and previous config saved to /var/cache/conftool/dbconfig/20230201-172516-root.json
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43553 and previous config saved to /var/cache/conftool/dbconfig/20230201-172510-root.json
* 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43552 and previous config saved to /var/cache/conftool/dbconfig/20230201-172502-root.json
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43551 and previous config saved to /var/cache/conftool/dbconfig/20230201-172456-root.json
* 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P43550 and previous config saved to /var/cache/conftool/dbconfig/20230201-172436-root.json
* 17:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS bullseye
* 17:22 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3056.esams.wmnet
* 17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS bullseye
* 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
* 17:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
* 17:15 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43549 and previous config saved to /var/cache/conftool/dbconfig/20230201-171016-root.json
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43548 and previous config saved to /var/cache/conftool/dbconfig/20230201-171011-root.json
* 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43547 and previous config saved to /var/cache/conftool/dbconfig/20230201-171005-root.json
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43546 and previous config saved to /var/cache/conftool/dbconfig/20230201-170957-root.json
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43545 and previous config saved to /var/cache/conftool/dbconfig/20230201-170951-root.json
* 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P43544 and previous config saved to /var/cache/conftool/dbconfig/20230201-170931-root.json
* 16:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
* 16:57 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
* 16:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43543 and previous config saved to /var/cache/conftool/dbconfig/20230201-165512-root.json
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43542 and previous config saved to /var/cache/conftool/dbconfig/20230201-165506-root.json
* 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43541 and previous config saved to /var/cache/conftool/dbconfig/20230201-165500-root.json
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43540 and previous config saved to /var/cache/conftool/dbconfig/20230201-165452-root.json
* 16:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43539 and previous config saved to /var/cache/conftool/dbconfig/20230201-165446-root.json
* 16:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS bullseye
* 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P43538 and previous config saved to /var/cache/conftool/dbconfig/20230201-165426-root.json
* 16:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
* 16:42 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
* 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43536 and previous config saved to /var/cache/conftool/dbconfig/20230201-164007-root.json
* 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43535 and previous config saved to /var/cache/conftool/dbconfig/20230201-164002-root.json
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43534 and previous config saved to /var/cache/conftool/dbconfig/20230201-163955-root.json
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2146 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43533 and previous config saved to /var/cache/conftool/dbconfig/20230201-163947-root.json
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2136 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P43532 and previous config saved to /var/cache/conftool/dbconfig/20230201-163941-root.json
* 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2106 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P43531 and previous config saved to /var/cache/conftool/dbconfig/20230201-163921-root.json
* 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
* 16:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS bullseye
* 16:31 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5030.eqsin.wmnet with OS bullseye
* 16:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
* 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
* 16:25 jynus: reloaded apache on mailman
* 16:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS bullseye
* 16:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
* 16:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
* 16:15 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
* 16:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
* 16:14 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
* 16:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
* 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
* 15:51 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
* 15:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
* 14:56 sukhe: cp1075.eqiad.wmnet for idrac firmware upgrade testing
* 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=ats-be
* 14:55 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp1075.eqiad.wmnet,service=cdn
* 14:52 awight: EU deployment window complete
* 14:48 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:48 awight@deploy1002: Finished scap: Backport for [[gerrit:884155{{!}}wmf-config: add new revision-score streams for EventGate main (T317768)]] (duration: 08m 25s)
* 14:47 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:41 awight@deploy1002: elukey and awight: Backport for [[gerrit:884155{{!}}wmf-config: add new revision-score streams for EventGate main (T317768)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2136 db2158 db2157 es2026 db2106 db2146 [[phab:T327404|T327404]]', diff saved to https://phabricator.wikimedia.org/P43530 and previous config saved to /var/cache/conftool/dbconfig/20230201-144152-root.json
* 14:40 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:40 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:40 awight@deploy1002: Started scap: Backport for [[gerrit:884155{{!}}wmf-config: add new revision-score streams for EventGate main (T317768)]]
* 14:39 ayounsi@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 14:39 ayounsi@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 14:37 awight@deploy1002: Finished scap: Backport for [[gerrit:885391{{!}}Add cswiki to desktop-improvements group. (T328154)]] (duration: 09m 22s)
* 14:29 awight@deploy1002: jdrewniak and awight: Backport for [[gerrit:885391{{!}}Add cswiki to desktop-improvements group. (T328154)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 14:28 awight@deploy1002: Started scap: Backport for [[gerrit:885391{{!}}Add cswiki to desktop-improvements group. (T328154)]]
* 14:26 awight@deploy1002: Finished scap: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]] (duration: 09m 07s)
* 14:19 awight@deploy1002: awight and mlitn: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
* 14:17 awight@deploy1002: Started scap: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]]
* 14:11 awight@deploy1002: backport aborted:  (duration: 06m 09s)
* 14:11 awight@deploy1002: sync-world aborted: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]] (duration: 03m 36s)
* 14:09 awight@deploy1002: mlitn and awight: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 14:07 awight@deploy1002: Started scap: Backport for [[gerrit:885798{{!}}Squashed diff to catch up to master]]
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3005.wikimedia.org
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 14:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 14:06 moritzm: updating perf on Bullseye hosts
* 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3005.wikimedia.org
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5002.wikimedia.org
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:47 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 13:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 13:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5002.wikimedia.org
* 13:21 moritzm: installing curl security updates on bullseye
* 13:00 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
* 12:59 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2003.codfw.wmnet
* 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 12:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 12:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 12:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2003.codfw.wmnet
* 12:16 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 12:15 jmm@cumin2002: START - Cookbook sre.puppet.renew-cert for testvm2002.codfw.wmnet: Renew puppet certificate - jmm@cumin2002
* 11:29 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part III ([[phab:T308932|T308932]]) (duration: 06m 43s)
* 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
* 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
* 11:22 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP (duration: 01m 45s)
* 11:21 ladsgroup@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part II ([[phab:T308932|T308932]]) (duration: 07m 04s)
* 11:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:20 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (codfw): Allow stylesheets through CSP
* 11:17 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 11:17 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP (duration: 00m 51s)
* 11:16 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@e1ca693] (eqiad): Allow stylesheets through CSP
* 11:14 ladsgroup@deploy1002: Synchronized wmf-config/ext-CirrusSearch.php: Move CirrusSearch settings from IS.php to ext-CirrusSearch.php, part I ([[phab:T308932|T308932]]) (duration: 07m 04s)
* 11:01 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0] (duration: 01m 18s)
* 11:00 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a8840b0]
* 10:59 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0] (duration: 00m 05s)
* 10:59 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0] (thin): Regular analytics weekly train THIN [analytics/refinery@a8840b0]
* 10:58 stevemunene@deploy1002: Finished deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0] (duration: 04m 29s)
* 10:54 stevemunene@deploy1002: Started deploy [analytics/refinery@a8840b0]: Regular analytics weekly train [analytics/refinery@a8840b0]
* 10:52 steve_munene: Deploying refinery for ops week
* 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:42 zabe: start running migrateRevisionCommentTemp in remaining sections (for now except s3) in screens # [[phab:T275246|T275246]]
* 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
* 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
* 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb2002.codfw.wmnet with OS bullseye
* 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: host reimage
* 10:01 godog: upgrade grafana to 8.5.20 on cloudmetrics* - [[phab:T328405|T328405]]
* 09:57 godog: upgrade grafana to 8.5.20 on grafana1002 - [[phab:T328405|T328405]]
* 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host krb2002.codfw.wmnet with OS bullseye
* 09:47 godog: upgrade grafana to 8.5.20 on grafana2001 - [[phab:T328405|T328405]]
* 09:15 urbanecm: Clean sign up throttle for IP 195.113.145.2 (via resetAuthenticationThrottle.php; [[phab:T328521|T328521]])
* 09:14 urbanecm@deploy1002: Finished scap: Backport for [[gerrit:885734{{!}}Add new throttle rule (T328521)]] (duration: 07m 24s)
* 09:07 urbanecm@deploy1002: Started scap: Backport for [[gerrit:885734{{!}}Add new throttle rule (T328521)]]
* 09:06 urbanecm@deploy1002: backport aborted:  (duration: 00m 01s)
* 09:05 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:883620{{!}}Create additional namespaces on shn.wikibooks (T327850)]] (duration: 15m 06s)
* 08:54 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: apply on main
* 08:54 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
* 08:52 ladsgroup@deploy1002: superpes and ladsgroup: Backport for [[gerrit:883620{{!}}Create additional namespaces on shn.wikibooks (T327850)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
* 08:50 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:883620{{!}}Create additional namespaces on shn.wikibooks (T327850)]]
* 08:49 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:885321{{!}}Add a wordmark to trwiktionary (T328499)]] (duration: 08m 05s)
* 08:45 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=k8s-ingress-staging
* 08:45 jayme@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=k8s-ingress-staging
* 08:42 ladsgroup@deploy1002: superpes and ladsgroup: Backport for [[gerrit:885321{{!}}Add a wordmark to trwiktionary (T328499)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
* 08:41 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:885321{{!}}Add a wordmark to trwiktionary (T328499)]]
* 08:40 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:884934{{!}}Add mobile wordmark to cswiktionary (T328357)]] (duration: 12m 26s)
* 08:29 ladsgroup@deploy1002: superpes and ladsgroup: Backport for [[gerrit:884934{{!}}Add mobile wordmark to cswiktionary (T328357)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
* 08:27 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:884934{{!}}Add mobile wordmark to cswiktionary (T328357)]]
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
* 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
* 08:27 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:879926{{!}}Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)]] (duration: 09m 42s)
* 08:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
* 08:19 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
* 08:19 ladsgroup@deploy1002: ladsgroup and krinkle: Backport for [[gerrit:879926{{!}}Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)]] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
* 08:17 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:879926{{!}}Remove former EventLogging streams for navtiming (T281103 T286703 T308621 T323623)]]
* 08:14 ladsgroup@deploy1002: Finished scap: Backport for [[gerrit:726854{{!}}Remove unused eventlogging_RUMSpeedIndex stream (T286700)]] (duration: 10m 15s)
* 08:06 ladsgroup@deploy1002: phedenskog and ladsgroup: Backport for [[gerrit:726854{{!}}Remove unused eventlogging_RUMSpeedIndex stream (T286700)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
* 08:05 moritzm: installing libarchive security updates
* 08:04 ladsgroup@deploy1002: Started scap: Backport for [[gerrit:726854{{!}}Remove unused eventlogging_RUMSpeedIndex stream (T286700)]]
* 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55821
* 07:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55821
* 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P43524 and previous config saved to /var/cache/conftool/dbconfig/20230201-073348-ladsgroup.json
* 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43523 and previous config saved to /var/cache/conftool/dbconfig/20230201-071841-ladsgroup.json
* 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P43522 and previous config saved to /var/cache/conftool/dbconfig/20230201-070335-ladsgroup.json
* 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P43521 and previous config saved to /var/cache/conftool/dbconfig/20230201-064828-ladsgroup.json
* 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 ([[phab:T310011|T310011]])', diff saved to https://phabricator.wikimedia.org/P43520 and previous config saved to /var/cache/conftool/dbconfig/20230201-064311-ladsgroup.json
* 06:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 06:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
* 06:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 06:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
* 00:38 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3055.esams.wmnet
* 00:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3055.esams.wmnet with OS bullseye
* 00:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
* 00:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3055.esams.wmnet with reason: host reimage
* 00:02 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp3054.esams.wmnet
* 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3054.esams.wmnet with OS bullseye


== 2021-08-19 ==
==Archives ==
* 23:15 brennen: ended backport & config window early, as no patches were scheduled and no new attendees for this week
* 22:42 ejegg: updated payments-wiki from {{Gerrit|0a27dbe9b6}} to {{Gerrit|564daed816}}
* 21:20 Amir1: ladsgroup@mwmaint2002:~$ mwscript extensions/FlaggedRevs/maintenance/pruneRevData.php --wiki=huwiki --prune ([[phab:T289249|T289249]])
* 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.19
* 19:07 razzi@deploy1002: Finished deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}} (duration: 03m 30s)
* 19:03 razzi@deploy1002: Started deploy [analytics/aqs/deploy@57c253e]: Deploy aqs {{Gerrit|9c062f2}}
* 18:27 razzi: Beginning aqs deploy process
* 18:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon2001.codfw.wmnet
* 17:49 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon2001.codfw.wmnet
* 17:48 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafkamon1001.eqiad.wmnet
* 17:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts kafkamon1001.eqiad.wmnet
* 17:11 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1004.eqiad.wmnet
* 17:01 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1004.eqiad.wmnet
* 17:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1003.eqiad.wmnet
* 16:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:49 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Re-enable Score with Shellbox on most public wikis ([[phab:T257066|T257066]]) (duration: 01m 08s)
* 16:46 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1003.eqiad.wmnet
* 16:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1002.eqiad.wmnet
* 16:31 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
* 16:31 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts maps1002.eqiad.wmnet
* 16:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1002.eqiad.wmnet
* 16:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts maps1001.eqiad.wmnet
* 16:14 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts maps1001.eqiad.wmnet
* 16:14 hnowlan: starting decommission of old eqiad maps hardware
* 16:10 cwhite: remove rotated logstash-plain-* and logstash-json-* logs on logstash collectors
* 16:00 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:53 dpifke@deploy1002: Finished deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again [[phab:T281243|T281243]] (duration: 00m 06s)
* 15:52 dpifke@deploy1002: Started deploy [performance/navtiming@f8bf39f]: Deploy CpuBenchmark processor again [[phab:T281243|T281243]]
* 15:50 Amir1: test2wiki)> delete from flaggedtemplates where ft_rev_id not in (select fp_stable from flaggedpages); ([[phab:T289249|T289249]])
* 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
* 15:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
* 15:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
* 15:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
* 15:29 jiji@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:25 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
* 15:06 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps[1001-1004].eqiad.wmnet with reason: Awaiting decommissioning
* 15:04 godog: clean logstash json logs off logstash hosts
* 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:49 jiji@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 14:36 effie: enable puppet on mediawiki and memcached servers for 713842
* 14:26 effie: disable puppet on mediawiki and memcached servers for 713842
* 13:58 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:49 urbanecm: Start server-side upload for 1 video file ([[phab:T288384|T288384]])
* 13:48 urbanecm: Start server-side upload for 1 video file ([[phab:T288554|T288554]])
* 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:47 kharlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 13:45 urbanecm: Start server-side upload for 1 video file ([[phab:T288628|T288628]])
* 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
* 13:44 kharlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
* 13:42 urbanecm: Start server-side upload for 1 video file ([[phab:T289203|T289203]])
* 13:40 kharlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
* 13:34 kormat: reconfiguring replication tree on pc3 [[phab:T284825|T284825]]
* 13:30 kormat: reconfiguring replication tree on pc2 [[phab:T284825|T284825]]
* 13:24 kormat: reconfiguring replication tree on pc1 [[phab:T284825|T284825]]
* 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:09 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote new h/w to primary of eqiad pc sections [[phab:T284825|T284825]] (duration: 01m 08s)
* 12:35 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:11 Lucas_WMDE: EU backport+config window done
* 12:11 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: [[gerrit:713523{{!}}Update termbox (T236893, T286775)]] (duration: 01m 08s)
* 11:56 zpapierski@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:713824{{!}}Revert "Don't set termbox v2 tags yet" (T236893, T286775)]] (duration: 01m 06s)
* 11:40 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/Wikibase/view/lib/wikibase-termbox/: Backport: [[gerrit:713513{{!}}Update termbox (T236893, T286775)]] (duration: 01m 08s)
* 11:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Backport: [[gerrit:713513{{!}}Update termbox (T236893T286775)]] (duration: 00m 01s)
* 11:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:45 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:42 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:36 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 10:12 twentyafterfour: restart php-fpm on phab1001
* 10:02 godog: roll-reload nginx on ms-fe to apply config change
* 08:48 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:48 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:41 dzahn@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'miscweb' for release 'main' .
* 04:20 effie: pool mw2383 - [[phab:T286463|T286463]]
* 01:13 ejegg: updated fundraising CiviCRM from {{Gerrit|73f6ec9190}} to {{Gerrit|8ed303f2d1}}
* 00:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 00:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
 
== 2021-08-18 ==
* 22:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping (duration: 02m 09s)
* 22:14 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@26480d5]: fully enable imagerec data shipping
* 21:15 jgleeson: civicrm changed from {{Gerrit|66568246a2}} to {{Gerrit|73f6ec9190}}
* 19:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping (duration: 02m 12s)
* 19:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@8d71e72]: configuration for imagerec data shipping
* 19:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.19 (duration: 01m 05s)
* 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.19
* 18:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:18 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 legoktm: Successfully published image docker-registry.discovery.wmnet/nodejs12-devel:0.0.1, docker-registry.discovery.wmnet/nodejs12-slim:0.0.1 ([[phab:T284346|T284346]])
* 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|559dd701a5859223afd49aaa33ddab70e8ebe721}}: Enable page previews on German Wikivoyage ([[phab:T264305|T264305]]) (duration: 01m 08s)
* 18:09 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|35113b617b3540242ac69a8285c54c70041bc14b}}: Enable DiscussionTools topicsubscription as beta feature on phase 1 wikis ([[phab:T287800|T287800]]) (duration: 01m 25s)
* 16:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 15:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:46 ejegg: updated matching gift employers list on payments-wiki
* 15:43 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 14:50 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:26 effie: enable puppet on alert*
* 14:11 effie: disable puppet on alerts* to avoid alert flood due to 713494
* 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:57 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:713619{{!}}ProductionServices: change rdb* servers in eqiad and codfw (T280582)]] (duration: 01m 51s)
* 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:41 godog: bounce logstash on logstash100[89]
* 13:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:24 effie: mw2383 is depooled - [[phab:T286463|T286463]]
* 13:01 kormat: Deploying wmfmariadbpy 0.7.2 [[phab:T289139|T289139]]
* 13:01 kormat: uploaded wmfmariadbpy 0.7.2 to apt.wm.o
* 11:38 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:36 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:35 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 11:12 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
* 11:03 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:47 effie: pooling mw2383 - [[phab:T286463|T286463]]
* 10:41 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 10:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
* 10:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1004.eqiad.wmnet with reason: Awaiting decommissioning
* 10:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 10:10 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: REIMAGE
* 09:36 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618] (duration: 05m 48s)
* 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@88c6618]
* 09:30 joal@deploy1002: Finished deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618] (duration: 00m 07s)
* 09:30 joal@deploy1002: Started deploy [analytics/refinery@88c6618] (thin): Regular analytics weekly train THIN [analytics/refinery@88c6618]
* 09:29 joal@deploy1002: Finished deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618] (duration: 32m 29s)
* 08:57 joal@deploy1002: Started deploy [analytics/refinery@88c6618]: Regular analytics weekly train [analytics/refinery@88c6618]
* 04:38 marostegui: Drop user2 from s6 - [[phab:T289051|T289051]]
* 02:03 rzl@cumin2001: conftool action : get/pooled; selector: service=docker-registry
* 00:39 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again ([[phab:T281243|T281243]]) (duration: 00m 05s)
* 00:39 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Revert CpuBenchmark again ([[phab:T281243|T281243]])
* 00:38 dpifke@deploy1002: Finished deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark ([[phab:T281243|T281243]]) (duration: 00m 06s)
* 00:38 dpifke@deploy1002: Started deploy [performance/navtiming@88f12a0]: Re-deploy fixed CpuBenchmark ([[phab:T281243|T281243]])
 
== 2021-08-17 ==
* 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:32 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.19/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php: [[phab:T288233|T288233]]: Work around cache failure for wikitech (duration: 01m 28s)
* 23:05 tzatziki: resetting email for vanished user
* 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:44 urbanecm: Deploy security patch for [[phab:T289063|T289063]]
* 20:30 brennen: running scap pull on mw2383
* 20:29 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.16 (duration: 02m 01s)
* 20:20 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.15 (duration: 06m 51s)
* 20:14 brennen: pruning 1.37.0-wmf.15 and .16 ([[phab:T281160|T281160]])
* 20:06 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.18/includes/block/BlockUser.php: {{Gerrit|d377d4fae704640c81172a6fa94b12b2efdba42c}}: BlockUser: Restore blocking autoblocked IP addresses ([[phab:T287798|T287798]]) (duration: 01m 08s)
* 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.19
* 19:02 brennen: 1.37.0-wmf.19 train status: no current blockers, proceeding to group0 ([[phab:T281160|T281160]])
* 17:44 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/includes/: Backport: [[gerrit:713506{{!}}Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998)]] (duration: 01m 13s)
* 17:41 urbanecm: [urbanecm@mw2383 ~]$ scap pull # to clear an icinga alert
* 17:39 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.19/includes/: Backport: [[gerrit:713365{{!}}Revert "objectcache: make use of new `modtoken` field in SqlBagOStuff" (T288998)]] (duration: 01m 14s)
* 17:15 bblack: authdns2001,dns[245]001: upgrade gdnsd package to 3.8.0-1~wmf1 (all authdns upgraded after this)
* 17:07 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:04 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:02 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:56 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.19 (duration: 38m 24s)
* 16:50 bblack: dns1001: upgrade gdnsd package to 3.8.0-1~wmf1
* 16:25 bblack: dns3001: upgrade gdnsd package to 3.8.0-1~wmf1
* 16:17 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.19
* 16:13 brennen: 1.37.0-wmf.19 train: running scap prep, branched at {{Gerrit|79c9b9e61350b0edd1acccb5e717875ba64cf9c1}}
* 16:08 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:06 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:55 urbanecm: Deploy a security patch for [[phab:T289064|T289064]]
* 15:37 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:32 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 15:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:41 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:37 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2013 to primary of pc3 [[phab:T284825|T284825]] (duration: 00m 58s)
* 14:25 jynus: running a full testwiki media backup on a single thread, single worker [[phab:T262668|T262668]]
* 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 14:20 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2012 to primary of pc2 [[phab:T284825|T284825]] (duration: 00m 59s)
* 13:53 jynus: rolling restart of minio on backup server
* 13:51 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 13:06 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'staging' .
* 12:55 jayme@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'blubberoid' for release 'production' .
* 11:29 phuedx@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Jobs/TallyElectionJob.php: Backport: [[gerrit:713361{{!}}tallyElectionJob: Catch and log exceptions (T288361)]] (duration: 00m 58s)
* 11:16 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17038 and previous config saved to /var/cache/conftool/dbconfig/20210817-111629-mvernon.json
* 11:15 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
* 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:09 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:01 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17037 and previous config saved to /var/cache/conftool/dbconfig/20210817-110125-mvernon.json
* 10:46 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17035 and previous config saved to /var/cache/conftool/dbconfig/20210817-104622-mvernon.json
* 10:31 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: buster reimage [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17034 and previous config saved to /var/cache/conftool/dbconfig/20210817-103118-mvernon.json
* 10:07 effie: enable puppet on mediawiki hosts
* 09:52 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
* 09:50 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: REIMAGE
* 09:20 mvernon@cumin1001: dbctl commit (dc=all): 'db2121 depooling: reimage to buster [[phab:T288244|T288244]]', diff saved to https://phabricator.wikimedia.org/P17033 and previous config saved to /var/cache/conftool/dbconfig/20210817-092045-mvernon.json
* 09:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1456.eqiad.wmnet
* 09:16 Emperor: reimaging db2121 to buster [[phab:T288244|T288244]]
* 09:08 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1456.eqiad.wmnet
* 08:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1276-1279].eqiad.wmnet
* 08:29 effie: disable puppet on mediawiki hosts to merge 712920
* 08:24 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1276-1279].eqiad.wmnet
* 08:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
* 08:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: new setup
* 08:21 mutante: mw2383 - scap pull (still depooled because [[phab:T286463|T286463]] but alerts in Icinga since a while)
* 08:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
* 08:18 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:18 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw127[6-9].eqiad.wmnet
* 08:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1456.eqiad.wmnet with reason: REIMAGE
* 08:17 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw127[6-9].eqiad.wmnet
* 08:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad [[phab:T280203|T280203]]
* 08:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1276-1279].eqiad.wmnet with reason: decom old appservers in eqiad [[phab:T280203|T280203]]
* 08:06 zpapierski@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:00 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw144[7-9].eqiad.wmnet
* 07:59 mutante: mw1384 - start failed ferm service
* 07:59 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw1450.eqiad.wmnet
* 07:52 mutante: mw1451 through mw1455 - fresh hardware pooled the first time as appservers
* 07:50 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw145[1-5].eqiad.wmnet
* 07:49 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw145[1-5].eqiad.wmnet
* 07:48 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw145[1-5].eqiad.wmnet
* 07:44 marostegui: Drop aft_feedback tables on x1 [[phab:T250715|T250715]]
* 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw1450.eqiad.wmnet
* 07:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[7-9].eqiad.wmnet
* 06:57 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Entities/Election.php: [[phab:T288924|T288924]] (duration: 00m 57s)
* 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:55 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/cli/dump.php: [[phab:T288924|T288924]] (duration: 00m 58s)
* 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:59 TimStarling: foreachwikiindblist securepollglobal mysql.php --write -- -e 'insert into securepoll_properties (pr_entity,pr_key,pr_value) select el_entity,'\''mobile-jump-url'\'','\''https://vote.m.wikimedia.org/wiki/Special:SecurePoll'\'' from securepoll_elections where el_title='\''DWalden STV Election Test 456'\'' limit 1;'
* 05:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:43 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:37 tstarling@deploy1002: Finished scap: collected SecurePoll maintenance scripts and bug fix (duration: 04m 12s)
* 05:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:33 tstarling@deploy1002: Started scap: collected SecurePoll maintenance scripts and bug fix
* 05:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 05:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:11 eileen: civicrm revision changed from {{Gerrit|175a3101f7}} to {{Gerrit|66568246a2}}, config revision is {{Gerrit|7bdc78073d}}
* 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:44 eileen: civicrm revision changed from {{Gerrit|ba0c7705bb}} to {{Gerrit|175a3101f7}}, config revision is {{Gerrit|7bdc78073d}}
* 00:14 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|eccdd3ed3fda1abee9a4c57719afd0d1faae41c3}}: Growth mentor dashboard: Enable on testwiki ([[phab:T278920|T278920]]) (duration: 00m 59s)
 
== 2021-08-16 ==
* 23:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:20 urbanecm: Evening B&C window done
* 23:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|a14868bbdf442eede5711576c4b4da51df0ccd77}}: Enable NewUserMessage on hiwiktionary ([[phab:T287091|T287091]]) (duration: 01m 00s)
* 23:15 eileen: civicrm revision changed from {{Gerrit|1e32084622}} to {{Gerrit|ba0c7705bb}}, config revision is {{Gerrit|7bdc78073d}}
* 22:13 bblack: dns[1235]002: upgrade gdnsd package to 3.8.0-1~wmf1
* 21:31 bblack: authdns1001: upgrade gdnsd package to 3.8.0-1~wmf1
* 21:28 bblack: dns4002: upgrade gdnsd package to 3.8.0-1~wmf1
* 20:38 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 20:38 bstorm@cumin1001: Added views for new wiki: labswiki [[phab:T287442|T287442]]
* 20:37 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:36 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 20:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 20:35 bstorm@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
* 20:35 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 18:48 dancy: Restarted Jenkins due to stuck jobs.
* 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
* 17:49 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1453.eqiad.wmnet with reason: REIMAGE
* 17:34 cmjohnson1: installing new line card in slot1 cr2-eqiad [[phab:T277339|T277339]]
* 17:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:712965{{!}}Try to use EditStash before re-rendering (T288639)]] (duration: 00m 59s)
* 17:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:25 XioNoX: cr1-eqiad> request chassis fpc offline slot 5 - [[phab:T277339|T277339]]
* 17:17 cmjohnson1: installing new line card in slot1 cr1-eqiad [[phab:T277339|T277339]]
* 17:11 ejegg: updated fundraising CiviCRM from {{Gerrit|f3895dc907}} to {{Gerrit|1e32084622}}
* 17:08 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port set pic-slot 1 member 8 port 1 - [[phab:T288834|T288834]]
* 17:05 XioNoX: asw2-a-eqiad> request virtual-chassis vc-port delete pic-slot 1 member 8 port 1 - [[phab:T288834|T288834]]
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:37 cwhite: restart logstash on logstash1008
* 16:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:01 mutante: LDAP - added user tandic to nda group ([[phab:T288527|T288527]])
* 15:37 ryankemper: [WDQS] Re-pooled `codfw`: `ryankemper@puppetmaster1001:~$ sudo -i confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=codfw' set/pooled=true`
* 14:42 mutante: miscweb - deploying new microsite for Wikidata Query Builder subpage ([[phab:T266703|T266703]])
* 14:41 mutante: mw1455 - works fine after a reimage, unknown why it didnt last time, but ok :)
* 14:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 14:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 13:53 mutante: mw1455 - mysteriously showing a bunch of issues in icinga, broken packages, envoy, memcached etc, after recent fresh install, trying another reimage ([[phab:T273915|T273915]])
* 13:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711515{{!}}Remove $wmgWikibaseFineGrainedLuaTracking (T288612)]] (duration: 00m 58s)
* 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711514{{!}}Stop setting $wgWBClientSettings['fineGrainedLuaTracking'] (T288612)]] (duration: 00m 58s)
* 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:711513{{!}}Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612)]] (beta, 2/2) (duration: 00m 59s)
* 13:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711513{{!}}Remove $wmgWikibaseClientUseTermsTableSearchFields (T288612)]] (prod, 1/2) (duration: 00m 59s)
* 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711512{{!}}Stop setting 'useTermsTableSearchFields' Wikibase option (T288612)]] (duration: 00m 59s)
* 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:22 Lucas_WMDE: EU backport+config window done (slightly belatedly)
* 12:20 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Pages/VotePage.php: allow linking by title (duration: 00m 58s)
* 12:17 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
* 12:15 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Math/src/HookHandlers/ParserHooksHandler.php: Backport: [[gerrit:712962{{!}}Support null content in parser tag hook (T288846)]] (hopefully also fixes [[phab:T288790|T288790]]) (duration: 00m 59s)
* 12:15 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2099.codfw.wmnet with reason: REIMAGE
* 12:14 kormat: clean up old /root/.my.cnf files [[phab:T150446|T150446]]
* 11:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:49 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:712754{{!}}Add extendedconfirmed on zhwiki (T287322)]] + Config: [[gerrit:713255{{!}}Fix extendedconfirmed for bots on zhwiki (T287322)]] (duration: 01m 01s)
* 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:26 Lucas_WMDE: namespaceDupes.php for [[phab:T287024|T287024]] finished
* 11:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes.php hrwiki --fix --add-prefix=[[phab:T287024|T287024]]/ {{!}} tee [[phab:T287024|T287024]].out # [[phab:T287024|T287024]]
* 11:12 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710564{{!}}Add namespace aliases for hr.wiki (T287024)]] (duration: 00m 59s)
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:32 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:713225{{!}}Add tags for wikidata edits (T236893)]] (duration: 00m 58s)
* 09:16 gehel: depooling wdqs codfw to allow catching up on lag
* 08:49 jynus: replacing s2 with s4 on db2097 [[phab:T287230|T287230]]
* 08:28 gehel: repool wdqs eqiad (`confctl --quiet --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=true`) - codfw currently overloaded
* 07:47 marostegui: Rename aft_feedback tables on db2115, db2131 - [[phab:T250715|T250715]]
* 06:41 TimStarling: on votewiki, set voter-privacy option to 1 on all prior elections [[phab:T288924|T288924]]
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17031 and previous config saved to /var/cache/conftool/dbconfig/20210816-055445-root.json
* 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17030 and previous config saved to /var/cache/conftool/dbconfig/20210816-055427-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17029 and previous config saved to /var/cache/conftool/dbconfig/20210816-053941-root.json
* 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17028 and previous config saved to /var/cache/conftool/dbconfig/20210816-053924-root.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17027 and previous config saved to /var/cache/conftool/dbconfig/20210816-052437-root.json
* 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17026 and previous config saved to /var/cache/conftool/dbconfig/20210816-052420-root.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17025 and previous config saved to /var/cache/conftool/dbconfig/20210816-050934-root.json
* 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17024 and previous config saved to /var/cache/conftool/dbconfig/20210816-050916-root.json
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3312 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17023 and previous config saved to /var/cache/conftool/dbconfig/20210816-045430-root.json
* 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2088:3311 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P17022 and previous config saved to /var/cache/conftool/dbconfig/20210816-045413-root.json
* 04:49 marostegui: Upgrade db2088 (s1 and s2) to 10.4.21
* 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088 (s1 and s2) to upgrade', diff saved to https://phabricator.wikimedia.org/P17021 and previous config saved to /var/cache/conftool/dbconfig/20210816-044906-marostegui.json
 
== 2021-08-15 ==
* 20:02 addshore: restarting blazegraph on wdqs2004
* 16:13 andrew@deploy1002: Finished deploy [horizon/deploy@c23a155]: adding cinder volume resize warning (duration: 03m 52s)
* 16:10 andrew@deploy1002: Started deploy [horizon/deploy@c23a155]: adding cinder volume resize warning
 
== 2021-08-14 ==
* 03:54 legoktm[m]: restarting mailman3 on lists1001, bounce runner crashed ([[phab:T288880|T288880]])
 
== 2021-08-13 ==
* 18:43 bblack: reprepro: uploaded gdnsd-3.8.0-1~wmf1 to buster-wikimedia - [[phab:T252132|T252132]]
* 17:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 17:32 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 17:06 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 17:05 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 15:39 mutante: mw1451, mw1452, mw1454 - rebooting after reimage, memcached needs one
* 15:30 mutante: mw1453 - racadm serveraction powercycle (down and was working until right before the switch issue)
* 15:18 godog: restart pybal on lvs2009, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
* 15:14 godog: restart pybal on lvs2010, to clear CRITICAL - thanos-swift_443: Servers thanos-fe2002.codfw.wmnet are marked down but pooled
* 15:02 mutante: etherpad1002 - started failed ferm
* 15:00 mutante: an-worker1117, an-worker1118 - started failed ferm (why are these slowly trickling in )
* 14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1450.eqiad.wmnet
* 14:57 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw144[7-9].eqiad.wmnet
* 14:54 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
* 14:54 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1451-1452,1454-1455].eqiad.wmnet with reason: new setup
* 14:50 mutante: an-worker1079 - started failed ferm
* 14:47 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw1450.eqiad.wmnet
* 14:46 jelto@cumin1001: conftool action : set/weight=25; selector: name=mw144[7-9].eqiad.wmnet
* 14:45 mutante: an-worker1095 - started ferm, service failed
* 14:44 mutante: an-worker1082 - started ferm (was failed due to DNS hickup)
* 14:44 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1450.eqiad.wmnet
* 14:43 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[7-9].eqiad.wmnet
* 14:41 mutante: mw1419 - started ferm
* 13:35 sukhe: ran homer for Gerrit 712400: Set up BGP peering to doh4002 in ulsfo
* 13:23 mutante: mw1453 - manual powercycle after it never rebooted when the reimage cookbook tries to trigger one
* 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 13:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 13:21 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1447-1449].eqiad.wmnet with reason: setup new mediawiki servers in eqiad https://phabricator.wikimedia.org/T279309
* 12:54 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
* 12:53 godog: set runtime envoy.reloadable_features.strict_1xx_and_204_response_headers=false on thanos-fe* - [[phab:T288815|T288815]]
* 12:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
* 12:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: new setup
* 12:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1454.eqiad.wmnet with reason: REIMAGE
* 12:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
* 12:31 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
* 12:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1452.eqiad.wmnet with reason: REIMAGE
* 12:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
* 12:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1450.eqiad.wmnet with reason: new setup
* 12:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
* 12:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1451.eqiad.wmnet with reason: REIMAGE
* 12:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1450.eqiad.wmnet with reason: REIMAGE
* 12:26 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 12:24 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=commonswiki --jobqueue # [[phab:T288683|T288683]]
* 12:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1455.eqiad.wmnet with reason: REIMAGE
* 12:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
* 12:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
* 12:21 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1444.eqiad.wmnet
* 12:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1449.eqiad.wmnet with reason: REIMAGE
* 12:21 mutante: mw1444 - scap pull, pooled as new API server for the first time
* 12:20 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1444.eqiad.wmnet
* 12:19 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1448.eqiad.wmnet with reason: REIMAGE
* 11:59 urbanecm: mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=mediawikiwiki --jobqueue # [[phab:T288683|T288683]]
* 11:36 topranks: cloudsw1-d5-eqiad - configuring new 2x40G trunk to cloudsw2-d5-eqiad with homer ([[phab:T277340|T277340]])
* 11:11 jelto: mw1455 - powering on via mgmt - OS install, initial setup ([[phab:T279309|T279309]], [[phab:T273915|T273915]])
* 10:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
* 10:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
* 10:07 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2003.codfw.wmnet
* 09:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
* 09:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1444.eqiad.wmnet with reason: new setup
* 09:42 mutante: mw1448, mw1449, mw1450 - powering on via mgmt - OS install, initial setup ([[phab:T279309|T279309]], [[phab:T273915|T273915]])
* 09:38 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 09:35 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2003.codfw.wmnet with reason: REIMAGE
* 09:35 mutante: mw1444 - signed puppet cert, initial run (after hardware fix) [[phab:T279309|T279309]]
* 09:17 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2003.codfw.wmnet
* 09:17 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2001.codfw.wmnet
* 09:15 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thanos-fe2002.codfw.wmnet
* 08:42 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 08:40 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet
* 08:40 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2002.codfw.wmnet with reason: REIMAGE
* 05:24 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
* 05:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: REIMAGE
* 01:02 tgr: running extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php for Growth wikis
 
== 2021-08-12 ==
* 23:50 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:712732{{!}}Set archive namespaces on foundationwiki to 'noindex,follow' (T288763)]] (duration: 00m 59s)
* 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 cjming@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/GrowthExperiments: Backport: [[gerrit:711719{{!}}Add Link: fix invalidation on non-addlink edit (T283606)]] (duration: 01m 00s)
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:09 tgr: [[phab:T283867|T283867]] running userOptions.php on Growth wikis as per [[phab:T283867|T283867]]#7280296
* 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:57 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711721{{!}}Don't generate HTML when asking for ParserOutput (T288639)]] (duration: 00m 58s)
* 21:52 urbanecm: Run `mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=$WIKI --jobqueue` for a bunch of Translate-enabled wikis ([[phab:T288683|T288683]])
* 21:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:30 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 21:13 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: sync {{Gerrit|Ic27418a0ec976347be5fa586bbd32cc4a0d8d511}} to unblock the train refs [[phab:T288775|T288775]] and [[phab:T281159|T281159]] (duration: 01m 07s)
* 20:56 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwikidatawiki --jobqueue # [[phab:T288683|T288683]], errored out
* 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=testwiki --jobqueue # [[phab:T288683|T288683]]
* 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:24 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # [[phab:T288683|T288683]]
* 20:13 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/Translate/scripts/refresh-translatable-pages.php --wiki=wikimaniawiki --jobqueue # [[phab:T288683|T288683]]
* 19:43 twentyafterfour@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Translate/src/PageTranslation/TranslationPage.php: sync {{Gerrit|I2f46abb20145630c27449ce57f1256e92f440144}} which should fix [[phab:T288683|T288683]] & [[phab:T288700|T288700]] thus unblocking the train: [[phab:T281159|T281159]] (duration: 01m 07s)
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4002.wikimedia.org
* 16:37 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
* 16:33 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1005:  (duration: 00m 15s)
* 16:32 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1005:
* 16:32 effie: enabling puppet on mediawiki servers  && rolling restart mcrouter
* 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1006:  (duration: 00m 15s)
* 16:31 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1006:
* 16:31 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1007:  (duration: 00m 15s)
* 16:30 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1007:
* 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1008:  (duration: 00m 15s)
* 16:29 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1008:
* 16:29 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1009:  (duration: 00m 17s)
* 16:28 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1009:
* 16:27 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps1010:  (duration: 00m 15s)
* 16:27 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps1010:
* 16:26 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2005:  (duration: 00m 24s)
* 16:26 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2005:
* 16:24 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2006:  (duration: 00m 23s)
* 16:24 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2006:
* 16:23 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2007:  (duration: 00m 27s)
* 16:23 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2007:
* 16:22 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2008:  (duration: 00m 24s)
* 16:21 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2008:
* 16:16 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2009:  (duration: 00m 24s)
* 16:15 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2009:
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:14 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: maps2010:  (duration: 00m 23s)
* 16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: maps2010:
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5 (duration: 02m 30s)
* 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@b88cf50]: Deploy tilerator 1.1.7-beta.5
* 15:50 papaul: powerdown ms-be2060 for relocation
* 15:49 mutante: netbox - deleted 2620:0:863:1:198:35:26:6/64 (along with 198.35.26.6) due to the previous error when running makevm cookbook ([[phab:T288630|T288630]])
* 15:47 mutante: netbox - deleted 198.35.26.6 (doh4002)
* 15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4002.wikimedia.org
* 15:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 15:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4002.wikimedia.org
* 15:33 moritzm: importing openjdk-8 8u302-b08-1+deb11u1 to apt.wikimedia.org/component/jdk8  [[phab:T287960|T287960]]
* 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1002.eqiad.wmnet
* 15:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 15:04 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1003.eqiad.wmnet with reason: REIMAGE
* 15:00 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1002.eqiad.wmnet
* 14:48 papaul: reset to factory ps-test-d8-codfw
* 14:35 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 14:33 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1002.eqiad.wmnet with reason: REIMAGE
* 14:33 papaul: reset to factory ps2-test-d8-codfw
* 14:25 hnowlan: reenabling puppet on P:cassandra
* 13:57 hnowlan: disabling puppet on P:cassandra to test removal of cassandra-metrics-agent
* 13:50 effie: disable puppet on mediawiki hosts to merge 705852
* 13:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 13:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts druid1003.eqiad.wmnet
* 13:20 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts druid1003.eqiad.wmnet
* 13:03 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
* 12:43 godog: upgrade NIC firmware on thanos-be2* / thanos-fe2* - [[phab:T286722|T286722]]
* 12:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 12:23 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:18 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1001.eqiad.wmnet with reason: REIMAGE
* 12:09 godog: upgrade NIC firmware on thanos-be1* - [[phab:T286722|T286722]]
* 12:08 godog: upgrade NIC firmware on thanos-fe100[34] - [[phab:T286722|T286722]]
* 12:04 godog: upgrade NIC firmware on thanos-fe100[12] - [[phab:T286722|T286722]]
* 11:57 moritzm: installing openexr security updates
* 11:47 moritzm: installing bluez security updates on buster
* 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Holger Knust out of all services on: 1743 hosts
* 10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Holger Knust out of all services on: 1743 hosts
* 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2107 into API', diff saved to https://phabricator.wikimedia.org/P17016 and previous config saved to /var/cache/conftool/dbconfig/20210812-101840-marostegui.json
* 10:18 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:13 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
* 10:08 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
* 09:49 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
* 09:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:31 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/: Backport: [[gerrit:711714{{!}}Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (2/2) (duration: 01m 12s)
* 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree [[phab:T284825|T284825]]
* 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 8 hosts with reason: Reconfiguring replication tree [[phab:T284825|T284825]]
* 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:29 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Wikibase/data-access/: Backport: [[gerrit:711714{{!}}Revert "Inject NamespaceInfo into EntitySourceDefinitionsConfigParser" (T288724)]] (1/2) (duration: 01m 08s)
* 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P17015 and previous config saved to /var/cache/conftool/dbconfig/20210812-092909-root.json
* 09:28 kormat: reconfiguring replication tree for pc1 [[phab:T284825|T284825]]
* 09:27 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc2011 to primary of pc1 [[phab:T284825|T284825]] (duration: 01m 10s)
* 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 80%: After reimage', diff saved to https://phabricator.wikimedia.org/P17014 and previous config saved to /var/cache/conftool/dbconfig/20210812-091406-root.json
* 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 60%: After reimage', diff saved to https://phabricator.wikimedia.org/P17013 and previous config saved to /var/cache/conftool/dbconfig/20210812-085902-root.json
* 08:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:56 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:55 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: [[phab:T288725|T288725]]
* 08:55 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudservices[1003-1004].wikimedia.org with reason: [[phab:T288725|T288725]]
* 08:53 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Adding new pc hosts (duration: 01m 09s)
* 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
* 08:48 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
* 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P17012 and previous config saved to /var/cache/conftool/dbconfig/20210812-084359-root.json
* 08:43 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
* 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
* 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
* 08:38 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
* 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
* 08:29 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
* 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 40%: After reimage', diff saved to https://phabricator.wikimedia.org/P17011 and previous config saved to /var/cache/conftool/dbconfig/20210812-082855-root.json
* 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
* 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
* 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 30%: After reimage', diff saved to https://phabricator.wikimedia.org/P17010 and previous config saved to /var/cache/conftool/dbconfig/20210812-081351-root.json
* 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 20%: After reimage', diff saved to https://phabricator.wikimedia.org/P17009 and previous config saved to /var/cache/conftool/dbconfig/20210812-075848-root.json
* 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
* 07:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
* 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
* 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
* 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 15%: After reimage', diff saved to https://phabricator.wikimedia.org/P17008 and previous config saved to /var/cache/conftool/dbconfig/20210812-074344-root.json
* 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
* 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
* 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
* 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
* 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
* 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
* 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P17007 and previous config saved to /var/cache/conftool/dbconfig/20210812-072841-root.json
* 07:26 godog: temp upgrade thanos to 0.22.0 on thanos-fe2001 to help debug a potential upstream issue
* 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
* 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
* 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
* 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
* 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
* 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P17006 and previous config saved to /var/cache/conftool/dbconfig/20210812-071337-root.json
* 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
* 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2107 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P17005 and previous config saved to /var/cache/conftool/dbconfig/20210812-065833-root.json
* 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/includes/Crypt/GpgCrypt.php: fix for [[phab:T288711|T288711]] failure of election creation (duration: 01m 09s)
* 06:47 moritzm: updating bullseye installations to the latest state of testing
* 06:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 06:36 moritzm: installing c-ares security updates on Bullseye
* 06:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:31 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 06:00 marostegui: Failover m3 from db1132 to db1107 - [[phab:T288197|T288197]]
* 05:15 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal after nuking wdqs2004's" --blazegraph_instance blazegraph`
* 05:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
* 05:14 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
* 04:45 eileen: tools revision changed from {{Gerrit|c26a8c0cb6}} to {{Gerrit|15bfaa7117}}
* 04:44 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
* 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
* 04:44 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
* 04:43 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 02m 07s)
* 04:41 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
* 04:41 ryankemper: [WDQS Deploy] Re-rolling deploy so that `wdqs2004` gets deployed to
* 04:41 ryankemper: [WDQS] `wdqs2004`'s disk is full due to overinflated `wikidata.jnl`, nuking and depooling: `sudo rm -fv /srv/wdqs/wikidata.jnl && sudo depool`
* 04:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9d03aaa]: 0.3.81 (duration: 17m 03s)
* 04:26 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.81` on canary `wdqs1003`; proceeding to rest of fleet
* 04:23 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9d03aaa]: 0.3.81
* 04:21 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.81`. Pre-deploy tests passing on canary `wdqs1003`
* 03:40 eileen: process-control config revision is {{Gerrit|7bdc78073d}}
* 03:01 eileen: civicrm revision changed from {{Gerrit|d8ebf45819}} to {{Gerrit|f3895dc907}}, config revision is {{Gerrit|7bdc78073d}}
 
== 2021-08-11 ==
* 23:30 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:24 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 2/2 (duration: 01m 08s)
* 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cirrus: switch more_like traffic to codfw 1/2 (duration: 01m 08s)
* 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:32 legoktm@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 07s)
* 22:30 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Score/includes/Score.php: Record shell outs in statsd (duration: 01m 08s)
* 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:710725{{!}}Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 08s)
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:29 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SpamBlacklist/includes/SpamBlacklistHooks.php: Backport: [[gerrit:711706{{!}}Avoid using deprecated WikiPage::prepareContentForEdit (T288639)]] (duration: 01m 07s)
* 21:18 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:58 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 20:30 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --move-talk --add-prefix=[[phab:T288643|T288643]] --fix # [[phab:T288643|T288643]]
* 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:23 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/Popups: Log VirtualPageView events to Event Platform ([[phab:T288655|T288655]]) (duration: 01m 06s)
* 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:20 mholloway-shell@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/Popups: Log VirtualPageView events to Event Platform ([[phab:T288655|T288655]]) (duration: 01m 09s)
* 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:29 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]] (duration: 01m 08s)
* 19:28 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:01 jgleeson: payments-wiki updated from {{Gerrit|a70aaa7944}} to {{Gerrit|0a27dbe9b6}}
* 18:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 18:24 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 18:23 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 18:23 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 18:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 18:22 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 18:21 bstorm: removed thirdparty/kubeadm-k8s-1-17 in reprepro
* 18:21 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 18:20 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 18:19 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 18:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill (duration: 02m 21s)
* 18:02 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@563f876]: process_sparql_query: increase parallelism to help backfill
* 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:36 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:35 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/pagers/ContribsPager.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (3/3) (duration: 01m 06s)
* 17:34 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionFactory.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (2/3) (duration: 01m 08s)
* 17:32 jforrester@deploy1002: Synchronized php-1.37.0-wmf.18/includes/Revision/RevisionStore.php: [[phab:T288563|T288563]] Don't explode Special:Contributions on extension-formatted rows (1/3) (duration: 01m 09s)
* 16:22 dancy: Results of testing php_fpm_always_restart:  php_fpm_always_restart=false: 1m19.942s    php_fpm_always_restart=true: 3m12.836s
* 16:19 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (after) (duration: 03m 12s)
* 16:16 thcipriani: moment of truth for php-fpm-always-restart in scap
* 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 16:05 dancy@deploy1002: Synchronized README: Testing scap php-rpm rolling restart (before) (duration: 01m 19s)
* 15:37 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
* 15:12 moritzm: import openjdk-8 8u302-b08-1+wmf1 to bullseye-wikimedia (bootstrap build, not to be used yet) [[phab:T287960|T287960]]
* 15:02 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts bast4002.wikimedia.org
* 14:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 14:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
* 14:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts bast4002.wikimedia.org
* 14:44 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts bast4002.wikimedia.org
* 14:44 sukhe: s/depool/decommission bast4002.wikimedia.org - [[phab:T288579|T288579]]
* 14:43 sukhe: depool bast4002.wikimedia.org - [[phab:T288579|T288579]]
* 14:23 moritzm: installing mx2002 [[phab:T286911|T286911]]
* 14:21 hnowlan: disabled cassandra-metrics-collector on maps*
* 13:33 moritzm: installing Java 8/Java 11 security updates on various analytics hosts
* 13:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 12:45 moritzm: imported openjdk-8 8u302-b08-1~deb10u1 to component/jdk8 for buster-wikimedia (forward port of the latest Java 8 security release)
* 12:32 godog: roll-restart prometheus [[phab:T284213|T284213]]
* 12:16 moritzm: installing c-ares security updates on stretch
* 12:16 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 12:14 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:33 Lucas_WMDE: EU backport+config window done
* 11:32 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711141{{!}}Remove $wmgWikibaseClientEntityNamespaces (T257260)]] (duration: 01m 08s)
* 11:31 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:29 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711140{{!}}Stop setting $wgWBClientSettings['entityNamespaces'] (T257260)]] (duration: 01m 07s)
* 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711139{{!}}Remove $wmgWikibaseRepoEntityNamespaces (T257260)]] (duration: 01m 08s)
* 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:711138{{!}}Stop setting $wgWBRepoSettings['entityNamespaces'] (T257260)]] (duration: 01m 08s)
* 11:17 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:17 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/extensions/SecurePoll/: Backport: [[gerrit:710720{{!}}Add ad-hoc logging to tally process (T288366)]] (duration: 01m 09s)
* 11:11 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:711248{{!}}Disable Collection sidebar link on English Wikisource (T288021)]] (duration: 01m 14s)
* 10:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:42 moritzm: rolling restart of Buster-based maps services to pick up c-ares security updates
* 10:37 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:20 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons. - btullis@cumin1001
* 09:50 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.18/includes/specials/SpecialWhatLinksHere.php: Backport: [[gerrit:710719{{!}}Fix SelectQueryBuilder use in SpecialWhatLinksHere (T288565)]] (duration: 01m 08s)
* 09:50 godog: upgrade thanos on cloudmetrics* - [[phab:T288604|T288604]]
* 09:26 godog: upgrade thanos on prometheus* - [[phab:T288604|T288604]]
* 09:21 elukey: run "sudo find /var/log/airflow -type f -mtime +15 -delete" on an-airflow1001 to free space (root partition almost full)
* 09:19 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:15 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 09:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 09:05 godog: upgrade thanos on thanos-fe* - [[phab:T288604|T288604]]
* 08:23 kormat@deploy1002: Synchronized wmf-config/ProductionServices.php: Minor cleanup of parsercache entries (duration: 01m 17s)
* 08:19 moritzm: restart Aphlict to pick up c-ares security updates
* 08:17 moritzm: restart Turnilo on an-tool1007 to pick up c-ares security updates
* 08:02 moritzm: rolling restart of AQS to pick up the c-ares security update
* 07:09 moritzm: restart etherpad-lite on etherpad1002 to pick up c-ares security updates
* 06:59 _joe_: deleting the staging deployment of mwdebug
* 05:55 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
* 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2107.codfw.wmnet with reason: REIMAGE
* 05:22 marostegui: Stop replication on db2107 [[phab:T287454|T287454]]
* 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2107 [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16999 and previous config saved to /var/cache/conftool/dbconfig/20210811-051856-marostegui.json
* 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2104 to s2 master and set section read-write [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16998 and previous config saved to /var/cache/conftool/dbconfig/20210811-051041-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16997 and previous config saved to /var/cache/conftool/dbconfig/20210811-050040-marostegui.json
* 05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - [[phab:T287454|T287454]]
* 04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16996 and previous config saved to /var/cache/conftool/dbconfig/20210811-041625-root.json
* 04:15 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 04:15 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 03:45 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 03:45 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
* 01:49 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 05s)
* 01:49 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Revert https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
* 01:47 dpifke@deploy1002: Finished deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423 (duration: 00m 06s)
* 01:47 dpifke@deploy1002: Started deploy [performance/navtiming@12d8381]: Deploying https://gerrit.wikimedia.org/r/c/performance/navtiming/+/693423
* 01:38 legoktm@deploy1002: Synchronized docroot/noc/conf/index.php: noc: Expose primary datacenter on conf/ (duration: 01m 06s)
* 01:22 bstorm@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
* 01:22 bstorm@cumin1001: Added views for new wiki: jvwikisource [[phab:T286245|T286245]]
* 01:00 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
* 00:38 bstorm@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
* 00:36 bstorm@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
 
== 2021-08-10 ==
* 23:33 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710344{{!}}Enable user links feature for pilot wikis, modern vector (T288274)]] (duration: 01m 08s)
* 23:18 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:06 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I13e88c303a}}, [[phab:T284418|T284418]] (duration: 01m 07s)
* 23:02 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:58 eileen: process-control config revision is {{Gerrit|7bdc78073d}}
* 22:50 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I8052636}}, {{Gerrit|I2038702b7e0}} (duration: 01m 21s)
* 21:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
* 21:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1054.eqiad.wmnet with reason: REIMAGE
* 21:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1053.eqiad.wmnet with reason: REIMAGE
* 21:44 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
* 21:42 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1052.eqiad.wmnet with reason: REIMAGE
* 21:42 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
* 21:40 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1051.eqiad.wmnet with reason: REIMAGE
* 21:40 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo pool`
* 21:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
* 21:40 ryankemper: [[phab:T288501|T288501]] `ryankemper@wdqs2003:~$ sudo pool`
* 21:38 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1050.eqiad.wmnet with reason: REIMAGE
* 21:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
* 21:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1049.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1048.eqiad.wmnet with reason: REIMAGE
* 21:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
* 21:33 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
* 21:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1047.eqiad.wmnet with reason: REIMAGE
* 21:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1046.eqiad.wmnet with reason: REIMAGE
* 21:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.37.0-wmf.18"
* 21:02 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|I3b54d163b6}} (duration: 01m 09s)
* 20:54 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: {{Gerrit|If7a8d6b6}} (duration: 01m 22s)
* 20:43 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
* 20:42 krinkle@deploy1002: Synchronized wmf-config/: {{Gerrit|Ic5ff34b}} (duration: 01m 08s)
* 20:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: REIMAGE
* 20:37 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
* 20:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
* 20:34 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1045.eqiad.wmnet with reason: REIMAGE
* 20:33 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
* 20:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1044.eqiad.wmnet with reason: REIMAGE
* 20:31 krinkle@deploy1002: Synchronized docroot/noc/: {{Gerrit|Ic013a93998f}} (duration: 01m 37s)
* 20:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
* 20:30 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1043.eqiad.wmnet with reason: REIMAGE
* 20:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
* 20:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1042.eqiad.wmnet with reason: REIMAGE
* 20:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1041.eqiad.wmnet with reason: REIMAGE
* 19:29 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
* 19:27 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
* 19:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1040.eqiad.wmnet with reason: REIMAGE
* 19:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: REIMAGE
* 19:16 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 19:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
* 19:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 19:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
* 19:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1005.eqiad.wmnet with reason: REIMAGE
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1024.eqiad.wmnet with reason: REIMAGE
* 19:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.18  refs [[phab:T281159|T281159]]
* 19:04 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1004.eqiad.wmnet with reason: REIMAGE
* 19:03 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: REIMAGE
* 18:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:47 ryankemper: [WDQS] `ryankemper@wdqs2005:~$ sudo depool` (~1.26 hours of lag)
* 18:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:46 ryankemper: [[phab:T288501|T288501]] (Misread grafana graph, `wdqs2003` only has 1.33 hours to catch up on)
* 18:45 ryankemper: [[phab:T288501|T288501]] `data-transfer` of `wikidata.jnl` completed successfully. Host needs to catch up on ~22 hours of WDQS lag before being re-pooled
* 18:42 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
* 17:23 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.18 (duration: 36m 35s)
* 17:19 ryankemper: [[phab:T288501|T288501]] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2005.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal to resolve disk issue" --blazegraph_instance blazegraph` on `cumin2001` tmux session `wdqs_data_xfer`
* 17:19 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
* 17:18 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:13 ryankemper: [[phab:T288501|T288501]] [WDQS] `ryankemper@wdqs2003:~$ sudo rm -fv /srv/wdqs/wikidata.jnl`
* 17:09 razzi@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:09 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 17:06 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
* 17:02 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 17:02 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 17:01 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
* 16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 16:49 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 16:47 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.18
* 16:36 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d3c5363]: [[phab:T287225|T287225]]: Bump rdf-spark-tools to 0.3.81 (duration: 02m 10s)
* 16:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d3c5363]: [[phab:T287225|T287225]]: Bump rdf-spark-tools to 0.3.81
* 16:33 btullis@cumin1001: END (FAIL) - Cookbook sre.druid.roll-restart-workers (exit_code=99) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 16:33 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - btullis@cumin1001
* 16:25 brennen: gitlab: run ansible to apply [[gerrit:710676{{!}}fix shell for backup cronjob]] ([[phab:T288324|T288324]])
* 16:01 moritzm: installing c-ares security updates on buster
* 14:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710515{{!}}Reduce ten seconds from dispatch max time (T288175)]] (duration: 00m 58s)
* 13:32 moritzm: updating bullseye installations to the latest state of testing
* 13:19 moritzm: installing perl security updates on Bullseye (older distros not affected)
* 13:00 jayme@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:54 ppchelko@deploy1002: Finished deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API [[phab:T287227|T287227]] (duration: 37m 18s)
* 12:42 lucaswerkmeister-wmde@deploy1002: Synchronized tests/multiversion/StaticSettingsTest.php: Config: [[gerrit:709504{{!}}Remove wmgWBRepoConceptBaseUri (T257260)]] (3/3, test) (duration: 00m 57s)
* 12:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:709504{{!}}Remove wmgWBRepoConceptBaseUri (T257260)]] (2/3, beta) (duration: 00m 57s)
* 12:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709504{{!}}Remove wmgWBRepoConceptBaseUri (T257260)]] (1/3, prod) (duration: 00m 57s)
* 12:36 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:709503{{!}}Stop setting $wgWBRepoSettings['conceptBaseUri'] (T257260)]] (duration: 00m 58s)
* 12:23 kormat: non-destructive (🤞) testing of db-switchover against s2/eqiad [[phab:T288500|T288500]]
* 12:17 ppchelko@deploy1002: Started deploy [restbase/deploy@5791a7a]: Add count parameter to recommendations API [[phab:T287227|T287227]]
* 11:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 11:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 10:56 marostegui: Install 10.4.21 on db1169 (s1)
* 10:54 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:53 mutante: etherpad deleting 2 pads as requested in [[phab:T288328|T288328]]
* 10:52 marostegui: Install 10.4.21 on db1096 (s5 and s6)
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:34 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 10:33 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 10:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:27 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:24 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:708309{{!}}Remove $wmgWikibaseClientRepoDatabase (T257260)]] (2/2, beta) (duration: 00m 57s)
* 09:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:708309{{!}}Remove $wmgWikibaseClientRepoDatabase (T257260)]] (1/2, prod) (duration: 00m 57s)
* 09:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:708308{{!}}Stop setting $wgWBClientSettings['repoDatabase'] (T257260)]] (duration: 00m 58s)
* 09:47 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:23 ariel@deploy1002: Finished deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file (duration: 00m 03s)
* 09:22 ariel@deploy1002: Started deploy [dumps/dumps@72ff209]: refuse to use info from corrupt run settings file
* 09:17 kormat: running non-destructive test against s7/codfw (db2107/db2014) [[phab:T288500|T288500]]
* 09:05 jayme@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:04 moritzm: removing stale Java 8 packages from logstash1024/1025/2023/2024/2025 (ELK7 Logstash cluster is on Java 11 for a while now)
* 09:00 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:58 ariel@deploy1002: Finished deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files (duration: 00m 03s)
* 08:58 ariel@deploy1002: Started deploy [dumps/dumps@170e394]: more resilience when reading bad run cache settings files
* 08:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:20 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 08:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 08:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 08:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 08:15 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 08:15 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 08:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 08:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 08:06 godog: upload thanos 0.21.1-1 and upgrade prometheus1004 / thanos-fe2001 to it - [[phab:T288326|T288326]]
* 08:03 moritzm: installing openjdk-8 security updates on stretch
* 07:33 moritzm: installing lynx security updates
* 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16987 and previous config saved to /var/cache/conftool/dbconfig/20210810-055642-root.json
* 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16986 and previous config saved to /var/cache/conftool/dbconfig/20210810-054139-root.json
* 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16985 and previous config saved to /var/cache/conftool/dbconfig/20210810-052635-root.json
* 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: repool after failed switchover', diff saved to https://phabricator.wikimedia.org/P16984 and previous config saved to /var/cache/conftool/dbconfig/20210810-051131-root.json
* 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 as read-write again - master has not been swapped [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16983 and previous config saved to /var/cache/conftool/dbconfig/20210810-050604-root.json
* 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16982 and previous config saved to /var/cache/conftool/dbconfig/20210810-050051-root.json
* 05:00 marostegui: Starting s2 codfw failover from db2107 to db2104 - [[phab:T287454|T287454]]
* 04:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 04:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s2 [[phab:T287454|T287454]]
* 04:16 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2104 with weight 0 [[phab:T287454|T287454]]', diff saved to https://phabricator.wikimedia.org/P16981 and previous config saved to /var/cache/conftool/dbconfig/20210810-041627-root.json
* 02:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
 
== 2021-08-09 ==
* 16:12 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 16:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 16:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 16:07 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 16:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 16:07 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 16:04 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 16:03 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
* 16:03 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
* 16:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:02 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 16:00 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 16:00 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
* 16:00 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
* 15:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
* 15:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2065.codfw.wmnet
* 15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2064.codfw.wmnet
* 15:33 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2062.codfw.wmnet
* 14:17 sukhe: ran homer for Gerrit 710358: Set up BGP peering to doh5002 in eqsin
* 14:10 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
* 14:09 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps100[1234].eqiad.wmnet
* 14:06 jayme: re-enabled (and ran) puppet on all kubernetes nodes - [[phab:T288345|T288345]]
* 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
* 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
* 14:05 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2063.codfw.wmnet
* 14:05 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 14:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
* 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
* 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: [[phab:T280886|T280886]] UCoC comment update (duration: 00m 58s)
* 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16979 and previous config saved to /var/cache/conftool/dbconfig/20210809-135805-root.json
* 13:52 kormat: disabling puppet on all db hosts for roll-out of [[phab:T285390|T285390]]
* 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 80%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16978 and previous config saved to /var/cache/conftool/dbconfig/20210809-134301-root.json
* 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 60%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16977 and previous config saved to /var/cache/conftool/dbconfig/20210809-132758-root.json
* 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 40%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16976 and previous config saved to /var/cache/conftool/dbconfig/20210809-131254-root.json
* 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 20%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16975 and previous config saved to /var/cache/conftool/dbconfig/20210809-125750-root.json
* 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db2128 (re)pooling @ 10%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P16974 and previous config saved to /var/cache/conftool/dbconfig/20210809-124247-root.json
* 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2128 [[phab:T288398|T288398]]', diff saved to https://phabricator.wikimedia.org/P16973 and previous config saved to /var/cache/conftool/dbconfig/20210809-123852-marostegui.json
* 11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
* 11:53 jayme: running puppet on kubernetes staging nodes (-b1 -s10) - [[phab:T288345|T288345]]
* 11:50 jayme: disabling puppet on all kubernetes nodes - [[phab:T288345|T288345]]
* 11:45 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:44 Lucas_WMDE: EU backport+config window done
* 11:44 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:705858{{!}}Remove wmgWikibaseClientRepoNamespaces (T257260)]] (duration: 00m 57s)
* 11:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:705857{{!}}Stop setting $wgWBClientSettings['repoNamespaces'] (T257260)]] (duration: 00m 57s)
* 11:37 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:706342{{!}}Remove wmgWikibaseClientRepositories (T257260)]] (2/2, beta) (duration: 00m 56s)
* 11:34 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:706342{{!}}Remove wmgWikibaseClientRepositories (T257260)]] (1/2, prod) (duration: 00m 57s)
* 11:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:706341{{!}}Stop setting $wgWBClientSettings['repositories'] (T257260)]] (duration: 00m 57s)
* 11:29 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
* 11:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1136.eqiad.wmnet with reason: REIMAGE
* 11:25 urbanecm: >>> \MediaWiki\MediaWikiServices::getInstance()->get('GrowthExperimentsWikiPageConfigLoader')->invalidate(Title::newFromText('MediaWiki:GrowthExperimentsConfig.json')) # dewiki shell.php; debugging Growth's wiki config
* 11:24 urbanecm@deploy1002: Synchronized wmf-config/config/dewiki.yaml: {{Gerrit|d6564351b28d3755369736f95c36063f8b980a22}}: dewiki: Enable Growth features in dark mode ([[phab:T288420|T288420]]; 3/3) (duration: 00m 57s)
* 11:23 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|d6564351b28d3755369736f95c36063f8b980a22}}: dewiki: Enable Growth features in dark mode ([[phab:T288420|T288420]]; 2/3) (duration: 00m 57s)
* 11:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|d6564351b28d3755369736f95c36063f8b980a22}}: dewiki: Enable Growth features in dark mode ([[phab:T288420|T288420]]; 1/3) (duration: 00m 57s)
* 11:19 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:16 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=dewiki --phab=[[phab:T288420|T288420]] # [[phab:T288420|T288420]]
* 11:15 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=dewiki growthexperiments # [[phab:T288420|T288420]]
* 11:15 urbanecm@deploy1002: Synchronized dblists/commonsuploads.dblist: {{Gerrit|9b9bb5b145fd67074c8122e0ddcba1b1e859bb78}}: Disable local uploads for non-administrators on nlwiki ([[phab:T288386|T288386]]) (duration: 00m 57s)
* 11:11 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|037aceb7f575d77930627f5062e183d514616f16}}: Enable GeoData on zhwikinews ([[phab:T287807|T287807]]) (duration: 00m 57s)
* 11:10 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster [[phab:T288244|T288244]]
* 11:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 15 hosts with reason: Reimage db1136 (s7 primary) to buster [[phab:T288244|T288244]]
* 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|54c532f4d05c6c3f8ab39d3693e481a92d1ccdf7}}: Add *.happysrv.de to the wgCopyUploadsDomains allowlist of Wikimedia Commons ([[phab:T288039|T288039]]) (duration: 00m 58s)
* 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:42 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:36 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710938{{!}}Enable shellbox constraint for commons wikis (T176312)]] (duration: 00m 57s)
* 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:31 awight@deploy1002: sync-file aborted: Config: [[gerrit:709027{{!}}[beta] Enable new VE template dialog sidebar (T286765)]] (duration: 00m 23s)
* 10:27 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710936{{!}}Enable post edit constraint jobs in all edits (T204031)]] (duration: 00m 58s)
* 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710925{{!}}Increase post edit constraint jobs to 85% of edits (T204031)]] (duration: 00m 58s)
* 09:46 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
* 09:44 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1005.eqiad.wmnet with reason: REIMAGE
* 09:31 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[1234].codfw.wmnet
* 08:46 godog: upgrade prometheus on prometheus2004 - [[phab:T222113|T222113]]
* 08:41 godog: upgrade prometheus on prometheus1004 - [[phab:T222113|T222113]]
* 08:36 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
* 08:34 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2002.codfw.wmnet with reason: REIMAGE
* 08:24 marostegui: Upgrade db1117 (all sections) to 10.4.19
* 08:03 ariel@deploy1002: Finished deploy [dumps/dumps@142e91c]: fix for [[phab:T288192|T288192]] runnerutils bug (duration: 00m 03s)
* 08:03 ariel@deploy1002: Started deploy [dumps/dumps@142e91c]: fix for [[phab:T288192|T288192]] runnerutils bug
* 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P16971 and previous config saved to /var/cache/conftool/dbconfig/20210809-075212-marostegui.json
* 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 07:30 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710919{{!}}Enable shellbox for constraint for all of wikidata (T176312)]] (duration: 00m 58s)
* 07:15 marostegui: Stop db1117:3323 to clone db1107 - [[phab:T288197|T288197]]
* 07:05 kart__: Updated cxserver to 2021-08-06-062053-production ([[phab:T288272|T288272]])
* 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
* 07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1107.eqiad.wmnet with reason: REIMAGE
* 06:53 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:45 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 05:56 XioNoX: enable cloudsw1-c8 interfaces toward cloudsw2-c8 - [[phab:T277340|T277340]]
* 05:23 marostegui: Lag in s4 (commonswiki) will appear on clouddb* hosts (wiki replicas) [[phab:T288273|T288273]]
* 05:23 marostegui: Optimize commonswiki.image on eqiad, lag will appear - [[phab:T288273|T288273]]
 
== 2021-08-06 ==
* 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:53 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:53 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
* 18:52 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:45 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 18:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 17:39 brennen: gitlab: run ansible to apply [[gerrit:710529{{!}}remove backup warning for config backups]] ([[phab:T288324|T288324]])
* 16:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
* 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts peek2001.codfw.wmnet
* 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
* 16:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1005.eqiad.wmnet with reason: Awaiting reimaging, depooled.
* 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:30 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts peek2001.codfw.wmnet
* 16:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
* 16:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 4:00:00 on peek2001.codfw.wmnet with reason: decom
* 16:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:02 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:14 hnowlan: removing maps1005 from old maps cassandra cluster before reimaging
* 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
* 14:29 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
* 14:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2005.codfw.wmnet with reason: Reimaging
* 14:26 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
* 14:24 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2005.codfw.wmnet with reason: REIMAGE
* 13:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1006.eqiad.wmnet
* 13:07 zpapierski@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 12:56 godog: test thanos 0.22 on thanos-fe2001 - [[phab:T288326|T288326]]
* 12:48 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:34 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
* 12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
* 12:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 12:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 12:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 12:22 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 12:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 12:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
* 12:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
* 12:20 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:20 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 11:45 jayme: enabling dragonfly dfdaemon on kubernetes200*
* 11:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
* 11:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1006.eqiad.wmnet with reason: REIMAGE
* 10:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 10:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: REIMAGE
* 09:58 kormat: reimaging db1181 (s7) to buster [[phab:T288244|T288244]]
* 09:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
* 09:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2005.codfw.wmnet with reason: Rebuilding as buster replica of maps1009
* 09:15 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
* 09:14 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 08:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:58 godog: test thanos 0.21 on thanos-fe2001 - [[phab:T288326|T288326]]
* 07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 07:36 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:32 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 07:15 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 07:02 dcausse@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 06:43 marostegui: Reboot db1107 to upgrade its kernel
* 05:47 marostegui: Optimize commonswiki.image on db1160 [[phab:T288273|T288273]]
* 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 [[phab:T288273|T288273]]', diff saved to https://phabricator.wikimedia.org/P16965 and previous config saved to /var/cache/conftool/dbconfig/20210806-054433-marostegui.json
* 05:44 eileen: civicrm revision changed from {{Gerrit|931b3defbe}} to {{Gerrit|c132d2f943}}, config revision is {{Gerrit|3696499932}}
* 04:03 TimStarling: on mwmaint1002 mwscript extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php --wiki=mediawikiwiki --edit-count-table=bv2021_edits --list-name=board-vote-2021 --short-min-edits=20 --long-min-edits=300
* 04:00 eileen: civicrm revision changed from {{Gerrit|e52f569991}} to {{Gerrit|931b3defbe}}, config revision is {{Gerrit|3696499932}}
* 03:54 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll/cli/wm-scripts/makeGlobalVoterList.php: need to run this script [[phab:T288025|T288025]] (duration: 00m 57s)
* 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
* 00:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2065.codfw.wmnet with reason: REIMAGE
* 00:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
* 00:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2064.codfw.wmnet with reason: REIMAGE
* 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 00:03 egardner@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/MediaSearch: Backport: [[gerrit:710387{{!}}Revert "Open search result links in-place"]] (duration: 00m 58s)
* 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
 
== 2021-08-05 ==
* 23:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
* 23:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2063.codfw.wmnet with reason: REIMAGE
* 23:17 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:16 legoktm@deploy1002: Synchronized php-1.37.0-wmf.17/includes/: Revert "Use CsrfTokenSet as CSRF token source" ([[phab:T287542|T287542]]) (duration: 01m 03s)
* 23:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
* 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
* 22:53 legoktm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/: Revert "Use CsrfTokenSet as CSRF token source" ([[phab:T287542|T287542]]) (duration: 01m 02s)
* 22:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:12 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/: [[phab:T288191|T288191]]: Support deprecated Content::preSaveTransform override (2/2) (duration: 00m 55s)
* 22:11 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/includes/content/ContentHandler.php: [[phab:T288191|T288191]]: Support deprecated Content::preSaveTransform override (1/2) (duration: 01m 00s)
* 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:41 jforrester@deploy1002: Synchronized php-1.37.0-wmf.17/skins/MonoBook/resources/screen-common.less: [[phab:T288288|T288288]] Restore visualClear style to MonoBook so that footer doesn't show in the interwiki list (duration: 01m 24s)
* 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:03 ejegg: updated payments-wiki from {{Gerrit|72fe99abb1}} to {{Gerrit|a70aaa7944}}
* 20:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
* 20:48 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2062.codfw.wmnet with reason: REIMAGE
* 20:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 20:23 dduvall: 1.37.0-wmf.17 promoted to all wikis. no new errors or concerning rates ([[phab:T281158|T281158]]). fixes for open UBN [[phab:T288191|T288191]] will be handled via backport (see task discussion)
* 20:18 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 19:24 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:18 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.17
* 19:00 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:58 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710328{{!}}Increase the ratio for shellbox for constraints to 42% in Wikidata (T176312)]] (duration: 01m 06s)
* 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710318{{!}}Increase the ratio for shellbox for constraints to 21% in Wikidata (T176312)]] (duration: 01m 06s)
* 18:23 topranks: Adding peering to second router of Xiber LLC - AS393950 - on cr2-eqord (Equinix IX Chicago)
* 18:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|da36bc3a05101f56e357969371b91e05660b9560}}: DiscussionTools: Make sourcemodetoolbar available everywhere ([[phab:T287927|T287927]]) (duration: 01m 06s)
* 18:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|0a14eb418288ad8ea25c206d20f2bed589de8107}}: wikimediaEvents: Enable IP address copy action instrument on all wikis ([[phab:T279540|T279540]]) (duration: 01m 07s)
* 18:17 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/DiscussionTools/extension.json: {{Gerrit|91f7c0233e2573a629e92a4b14c9b4be2b401e2f}}: Change sourcemodetoolbar default to enabled when available ([[phab:T287927|T287927]]) (duration: 01m 06s)
* 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:16 urbanecm@deploy1002: sync-file aborted: {{Gerrit|91f7c0233e2573a629e92a4b14c9b4be2b401e2f}}: Change sourcemodetoolbar default to enabled when available ([[phab:T287927|T287927]]) (duration: 00m 04s)
* 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/DiscussionTools/extension.json: {{Gerrit|38a8658d81f16700accf0df68504a121ddf41ffb}}: Change sourcemodetoolbar default to enabled when available ([[phab:T287927|T287927]]) (duration: 01m 06s)
* 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:54 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:49 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710315{{!}}Increase the shellbox ratio to 5% for wikidata (T176312)]] (duration: 01m 15s)
* 17:43 elukey: upgrade helm3 to 3.6.3-1 on release*, contint*, chartmuseum*, deploy2002 (1002 was already done before)
* 17:43 herron: rolling restart eqiad logstash cluster for java updates
* 17:41 ebernhardson: restart airflow-<nowiki>{</nowiki>scheduler{{!}}webserver<nowiki>}</nowiki> on an-airflow1001 to pickup deployed plugin changes
* 17:36 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774 (duration: 09m 01s)
* 17:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:25 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:25 Amir1: end of pdf rebuild on commonswiki ([[phab:T275268|T275268]])
* 17:23 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@9872df9]: pyspark generalization gerrit:709837 and 666774
* 17:15 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:48 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2006.codfw.wmnet
* 16:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:710297{{!}}Enable shellbox for constraints for 1% of wikidata (T176312)]] (duration: 01m 27s)
* 16:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1010.eqiad.wmnet
* 16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:21 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:15 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets (duration: 00m 22s)
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:14 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2006: imposm: add codfw targets
* 16:14 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:13 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets (duration: 00m 25s)
* 16:12 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2007: imposm: add codfw targets
* 16:11 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets (duration: 00m 23s)
* 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2008: imposm: add codfw targets
* 16:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:10 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets (duration: 00m 29s)
* 16:10 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2009: imposm: add codfw targets
* 16:09 mbsantos@deploy1002: Finished deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets (duration: 00m 22s)
* 16:09 mbsantos@deploy1002: Started deploy [tilerator/deploy@16dbc04]: maps2010: imposm: add codfw targets
* 16:04 hnowlan: draining maps1006 from maps cassandra cluster
* 16:04 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere (duration: 00m 24s)
* 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:03 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2006: tegola: mirror 5% of requests everywhere
* 16:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 16:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1006.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:02 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
* 16:02 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1006.eqiad.wmnet
* 16:01 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2010: tegola: mirror 5% of requests everywhere
* 16:01 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere (duration: 00m 55s)
* 16:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 16:00 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2009: tegola: mirror 5% of requests everywhere
* 15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere (duration: 00m 21s)
* 15:59 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: maps2008: tegola: mirror 5% of requests everywhere
* 15:59 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:59 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere (duration: 00m 22s)
* 15:58 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846]: tegola: mirror 5% of requests everywhere
* 15:57 mbsantos@deploy1002: deploy aborted: tegola: mirror 5% of requests everywhere (duration: 00m 03s)
* 15:57 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0ea1846] (imposm): tegola: mirror 5% of requests everywhere
* 15:54 herron: rolling restart codfw logstash elasticsearch cluster for java updates
* 15:52 elukey: upgrade helm3 to 3.6.3-1 on deploy1002
* 15:28 vgutierrez: pool lvs2009 - [[phab:T286881|T286881]]
* 15:27 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006 (duration: 00m 20s)
* 15:27 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Deploy imposm to maps2006
* 15:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2009.codfw.wmnet with reason: [[phab:T286881|T286881]]
* 15:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2009.codfw.wmnet with reason: [[phab:T286881|T286881]]
* 15:11 vgutierrez: depool lvs2009 - [[phab:T286881|T286881]]
* 15:10 vgutierrez: pool lvs2008 - [[phab:T286881|T286881]]
* 14:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T286881|T286881]]
* 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: [[phab:T286881|T286881]]
* 14:52 vgutierrez: depool lvs2008 - [[phab:T286881|T286881]]
* 14:50 elukey: upload helm 3.6.3-1 to <nowiki>{</nowiki>buster,stretch<nowiki>}</nowiki>-wikimedia
* 14:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
* 14:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1010.eqiad.wmnet with reason: Reimaging
* 14:24 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5002.wikimedia.org
* 14:18 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
* 14:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1010.eqiad.wmnet with reason: REIMAGE
* 14:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
* 14:12 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
* 14:10 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2006.codfw.wmnet with reason: REIMAGE
* 14:00 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:49 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
* 13:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
* 13:44 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
* 13:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
* 13:39 mutante: deleted reserved (not active) IP 103.102.166.5/28 from netbox ([[phab:T284246|T284246]])
* 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:35 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: [[gerrit:710094{{!}}Add 'constraint-regex-checker' to isEnabled() check as well (T176312)]] (duration: 01m 06s)
* 13:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: [[gerrit:710095{{!}}Add 'constraint-regex-checker' to isEnabled() check as well (T176312)]] (duration: 01m 19s)
* 13:12 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:08 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:54 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:52 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:52 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 12:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
* 12:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
* 11:59 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 11:59 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1010.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 11:58 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1010.eqiad.wmnet
* 11:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
* 11:47 XioNoX: prepare cloudsw1-c8-eqiad for cloudsw2-c8 - [[phab:T277340|T277340]]
* 11:41 hnowlan: removing maps2006 from old maps cassandra cluster
* 11:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
* 11:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2006.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
* 11:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
* 11:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: Rebuilding as buster replica of maps2009
* 11:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2006.codfw.wmnet
* 11:07 topranks: Reconfiguring packet buffer partitioning on cloudsw-d5-eqiad [[phab:T288037|T288037]]
* 11:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
* 10:25 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709821{{!}}Add shellbox-constraint services and use them (T176312)]], Part III (duration: 01m 06s)
* 10:24 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: [[gerrit:709821{{!}}Add shellbox-constraint services and use them (T176312)]], Part II (duration: 01m 07s)
* 10:23 ladsgroup@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:709821{{!}}Add shellbox-constraint services and use them (T176312)]], Part I (duration: 01m 07s)
* 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:11 vgutierrez: restart acme-chief on acmechief1001
* 10:06 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode2001.codfw.wmnet
* 10:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 10:03 topranks: Reconfiguring packet buffer partitioning on cloudsw-c8-eqiad [[phab:T288036|T288036]]
* 10:01 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: [[gerrit:710093{{!}}Route Shellbox requests to 'constraint-regex-checker' service (T176312)]] (duration: 01m 06s)
* 09:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php: Backport: [[gerrit:710092{{!}}Route Shellbox requests to 'constraint-regex-checker' service (T176312)]] (duration: 01m 27s)
* 09:56 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode2001.codfw.wmnet
* 09:49 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
* 09:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1008.eqiad.wmnet with reason: REIMAGE
* 09:28 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 09:19 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=registry2004.codfw.wmnet,dc=codfw,cluster=docker-registry
* 09:05 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 08:28 godog: bounce grafana to apply new settings - [[phab:T119719|T119719]]
* 08:00 marostegui: Failover m2 from db1107 to db1183 - [[phab:T287852|T287852]]
* 06:54 godog: prometheus/ops codfw +100G
* 06:15 godog: add back thanos-be1003 sdf1 in thanos-swift
* 04:03 ejegg: re-enabled fundraising scheduled jobs (process-control)
* 03:03 ejegg: disabled fundraising scheduled jobs (process-control)
* 02:50 TimStarling: on mwmaint1002 killing populateEditCount.php for loginwiki -- it's slow but it's not going to find any edits
* 02:46 eileen: civicrm revision changed from {{Gerrit|d6baf291f4}} to {{Gerrit|e52f569991}}, config revision is {{Gerrit|360c8a1f08}}
* 01:26 Krinkle: krinkle@mwmaint1002 Temporarily grant myself `translationadmin` on wikimania2016wiki in order to approve an edit given FlaggedRevs-like nature of Translate
* 00:24 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Remove DynamicPageList from all Wikimania wikis except 2016 ([[phab:T287916|T287916]]) (duration: 01m 52s)
 
== 2021-08-04 ==
* 22:18 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836 (duration: 06m 52s)
* 22:11 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@34cd541]: gerrit:709835 and 709836
* 20:56 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:21 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:22 dduvall: 1.37.0-wmf.17 promoted to group1. no new errors or troubling error rates spotted ([[phab:T281158|T281158]])
* 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.17 (duration: 01m 15s)
* 19:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.17
* 18:45 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/: {{Gerrit|5c3ac582335265287369e2d06332645ddbcba412}}: Fix array key handling for GEHelpPanelLinks in on-wiki config ([[phab:T288023|T288023]]) (duration: 01m 08s)
* 18:34 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
* 18:33 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
* 18:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GrowthExperiments/: {{Gerrit|36a2b9f58148dad5434daa6d03b77f4c8b839314}}: Fix array key handling for GEHelpPanelLinks in on-wiki config ([[phab:T288023|T288023]]) (duration: 01m 06s)
* 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:25 mutante: mw2379, mw2380 - scap pull
* 18:16 brennen: gitlab1001: upgrading to 13.12.9
* 18:11 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:11 brennen: gitlab2001: upgrading to 13.12.9
* 18:10 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events by default - [[phab:T287789|T287789]] (duration: 01m 06s)
* 18:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
* 18:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
* 18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 18:01 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2380.codfw.wmnet with reason: reimage
* 18:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
* 18:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2379.codfw.wmnet with reason: reimage
* 17:59 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 17:59 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: REIMAGE
* 17:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: REIMAGE
* 17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2378.codfw.wmnet
* 17:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2377.codfw.wmnet
* 17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2380.codfw.wmnet
* 17:46 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw237[7-9].codfw.wmnet
* 17:45 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2357.codfw.wmnet
* 17:41 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2357.codfw.wmnet
* 17:40 mutante: mw2357, mw2377, mw2378 - scap pull
* 17:40 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw2357.codfw.wmnet
* 17:39 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:29 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw238[1-2].codfw.wmnet
* 17:29 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 17:27 ejegg: updated payments-wiki config to {{Gerrit|360c8a1f08}}
* 17:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2355.codfw.wmnet
* 17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2353.codfw.wmnet
* 17:25 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2351.codfw.wmnet
* 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2355.codfw.wmnet
* 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2353.codfw.wmnet
* 17:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2351.codfw.wmnet
* 17:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 17:12 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/GrowthExperiments/maintenance/updateMenteeData.php: {{Gerrit|66c2c7593322dfc575edc818aaff8d9b79466bdd}}: updateMenteeData: Output how long the script took ([[phab:T287964|T287964]]) (duration: 01m 07s)
* 17:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: REIMAGE
* 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
* 17:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
* 17:09 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: REIMAGE
* 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2357.codfw.wmnet with reason: REIMAGE
* 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 16:55 mutante: mw2351, mw2353, mw2355 - scap pull
* 16:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
* 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
* 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2355.codfw.wmnet with reason: reimage
* 16:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 16:23 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
* 16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
* 16:22 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
* 16:22 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
* 16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
* 16:21 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2353.codfw.wmnet with reason: reimage
* 16:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 16:21 joe: find . -type f -delete on /var/cache/nginx-docker-registry on registry2*, the disk is too small for unbound cache *and* accepting large uploads
* 16:20 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2355.codfw.wmnet with reason: REIMAGE
* 16:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 16:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: REIMAGE
* 16:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: REIMAGE
* 16:15 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 16:15 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1008.eqiad.wmnet with reason: Rebuilding as buster replica of maps1009
* 16:14 hnowlan: draining maps1008 from cassandra cluster
* 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
* 16:02 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
* 16:02 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2357.codfw.wmnet with reason: reimage
* 16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
* 16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw2380.codfw.wmnet with reason: reimage
* 16:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
* 16:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[2377-2379].codfw.wmnet with reason: reimage
* 15:58 mutante: mw2351, mw2353, mw2355, mw2357 - converting from appserver to jobrunner, mw2377, mw2378, mw2379, mw2380 - converting from jobrunner to appserver - for balancing of server types over rows
* 15:51 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
* 15:50 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw237[789].codfw.wmnet
* 15:48 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].codfw.wmnet
* 15:47 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw235[1357].wmnet
* 14:30 godog: upgrade prometheus on cloudmetrics hosts - [[phab:T222113|T222113]]
* 14:28 godog: upgrade prometheus on prometheus4001 - [[phab:T222113|T222113]]
* 14:19 moritzm: imported gitlab-ce 13.12.9 to thirdparty/gitlab  [[phab:T287671|T287671]]
* 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:17 godog: depool prometheus2004 and pool prometheus2003 - [[phab:T222113|T222113]]
* 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) [[phab:T286226|T286226]]
* 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 18 hosts with reason: Firmware upgrade on db1104 (s8 primary) [[phab:T286226|T286226]]
* 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:02 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 13:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:50 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|5d7255c1127f951da59b9b48749fe9cf59e11930}}: jvwikisource: Add author namespace ([[phab:T286241|T286241]]) (duration: 01m 06s)
* 13:49 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 13:19 urbanecm: jvwikisource was created ([[phab:T286241|T286241]])
* 13:19 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 11s)
* 13:18 volans: upgraded python3-wmflib to v0.0.9 fleet wide
* 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 06s)
* 13:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 06s)
* 13:10 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 07s)
* 13:09 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating jvwikisource ([[phab:T286241|T286241]])
* 13:08 urbanecm@deploy1002: Synchronized dblists: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 07s)
* 13:07 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 07s)
* 13:05 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating jvwikisource ([[phab:T286241|T286241]]) (duration: 01m 08s)
* 12:23 godog: depool prometheus2004 for upgrade - [[phab:T222113|T222113]]
* 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16958 and previous config saved to /var/cache/conftool/dbconfig/20210804-120725-marostegui.json
* 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:53 reedy@deploy1002: Synchronized docroot/mediawiki.org/xml/index.html: [[phab:T288040|T288040]] (duration: 01m 08s)
* 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 11:43 moritzm: installing testvm2001 [[phab:T286206|T286206]]
* 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317 and db1101:3317 [[phab:T286888|T286888]]!', diff saved to https://phabricator.wikimedia.org/P16957 and previous config saved to /var/cache/conftool/dbconfig/20210804-113623-marostegui.json
* 11:24 phuedx@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: Backport: [[gerrit:709974{{!}}Use real transactions when creating an election]] (duration: 01m 08s)
* 11:21 phuedx@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: Backport: [[gerrit:709973{{!}}Use real transactions when creating an election]] (duration: 01m 19s)
* 10:53 jayme: running puppet on eqiad appservers
* 10:48 jayme: switch most eqiad appservers to appserver_dragonly role for testing - [[phab:T286054|T286054]]
* 10:29 jayme: importing dragonfly 1.0.6-1 (downgrade from 1.0.6-2) to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16955 and previous config saved to /var/cache/conftool/dbconfig/20210804-101719-marostegui.json
* 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
* 09:10 volans: uploaded python3-wmflib_0.0.9 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
* 08:55 legoktm@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=shellbox-constraints
* 08:51 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 08:41 godog: pool prometheus1003 (and depool prometheus1004 for testing 1003 only) - [[phab:T222113|T222113]]
* 08:27 legoktm: restarting pybal on lvs2009 to add shellbox-constraints service
* 08:24 legoktm: restarting pybal on lvs1015 to add shellbox-constraints service
* 08:22 legoktm: restarting pybal on lvs2010 to add shellbox-constraints service
* 08:18 legoktm: restarting pybal on lvs1016 to add shellbox-constraints service
* 08:00 godog: upgrade prometheus1003 - [[phab:T222113|T222113]]
* 06:53 moritzm: installing testvm2002 [[phab:T286206|T286206]]
* 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1174 and db1127 [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16954 and previous config saved to /var/cache/conftool/dbconfig/20210804-064548-marostegui.json
* 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3312, db1105:3312, db1105:3311 [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16953 and previous config saved to /var/cache/conftool/dbconfig/20210804-060347-marostegui.json
* 05:35 joe: docker image prune on releases1002, [[phab:T288024|T288024]]
* 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16952 and previous config saved to /var/cache/conftool/dbconfig/20210804-050751-marostegui.json
* 04:54 TimStarling: on mwmaint2002: running bv2021/populateEditCounts.php on all wikis with one thread per section s1-s8
* 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 to clone db1170:3312 [[phab:T286888|T286888]]', diff saved to https://phabricator.wikimedia.org/P16950 and previous config saved to /var/cache/conftool/dbconfig/20210804-044507-marostegui.json
* 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 to clone db1127 [[phab:T286763|T286763]]', diff saved to https://phabricator.wikimedia.org/P16948 and previous config saved to /var/cache/conftool/dbconfig/20210804-043438-marostegui.json
* 04:10 TimStarling: on mwmaint2002: creating bv2021_edits table on all wikis
* 03:58 tstarling@deploy1002: Synchronized php-1.37.0-wmf.17/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 06s)
* 03:56 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/SecurePoll: for bv2021/populateEditCount.php (duration: 01m 18s)
* 03:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 03:27 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
 
== 2021-08-03 ==
* 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:17 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:709770{{!}}Re-enable commonswiki sister search (T277225)]] (duration: 01m 07s)
* 22:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 22:37 dduvall: re-rolled 1.37.0-wmf.17 to group0 following rollback and subsequent fixes for [[phab:T287988|T287988]] ([[phab:T281158|T281158]])
* 22:28 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 22:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 3/3) (duration: 01m 07s)
* 22:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/MySQLMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 22:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|7d286dc0feaef354943a70ee18014d55cbb2aefa}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 07s)
* 21:51 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output (duration: 45m 00s)
* 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:37 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 21:06 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2d533ba]: enable glent version marker in final output
* 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 20:46 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:45 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 13 days, 0:00:00 on mw2383.codfw.wmnet with reason: [[phab:T286463|T286463]]
* 20:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 00m 37s)
* 20:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: REVERT: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 00m 37s)
* 20:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/includes/libs/rdbms/database/position/DBMasterPos.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 2/3) (duration: 01m 07s)
* 20:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/autoload.php: {{Gerrit|2d4ea752ec6f412ba071ef46023c978d55afcd98}}: Add (MySQL/DB)PrimaryPos as an alias to (MySQL/DB)MasterPos ([[phab:T287988|T287988]]; 1/3) (duration: 01m 08s)
* 20:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: REIMAGE
* 20:13 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871] (duration: 05m 36s)
* 20:08 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ea78871]
* 20:08 otto@deploy1002: Finished deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871] (duration: 00m 07s)
* 20:07 otto@deploy1002: Started deploy [analytics/refinery@ea78871] (thin): Regular analytics weekly train THIN [analytics/refinery@ea78871]
* 20:03 otto@deploy1002: Finished deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871] (duration: 20m 38s)
* 19:57 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:52 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: REIMAGE
* 19:42 otto@deploy1002: Started deploy [analytics/refinery@ea78871]: Regular analytics weekly train [analytics/refinery@ea78871]
* 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:31 ryankemper: [[phab:T285355|T285355]] `ryankemper@an-web1001:~$ sudo run-puppet-agent` to establish `role(analytics_cluster::webserver)` on the host in preparation for upcoming cutover from `thorium`->`an-web1001`
* 19:31 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561] (duration: 05m 40s)
* 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:27 dduvall@deploy1002: rebuilt and synchronized wikiversions files: revert group0 wikis to 1.37.0-wmf.16
* 19:25 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aceb561]
* 19:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.17
* 19:01 otto@deploy1002: Finished deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561] (duration: 00m 07s)
* 19:01 otto@deploy1002: Started deploy [analytics/refinery@aceb561] (thin): Regular analytics weekly train THIN [analytics/refinery@aceb561]
* 19:00 otto@deploy1002: Finished deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561] (duration: 16m 25s)
* 18:47 Amir1: running mwscript migrateUserGroup.php --wiki=idwiki editor reviewer ([[phab:T286853|T286853]])
* 18:44 otto@deploy1002: Started deploy [analytics/refinery@aceb561]: Regular analytics weekly train [analytics/refinery@aceb561]
* 18:29 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.17 (duration: 36m 44s)
* 18:18 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:12 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:06 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization (duration: 00m 48s)
* 18:05 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@f0f70d1]: [[phab:T286642|T286642]] fixes to bulk daemon prioritization
* 17:52 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.17
* 16:59 hashar: Gerrit has been upgraded
* 16:47 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001 (duration: 00m 07s)
* 16:47 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit1001
* 16:45 urbanecm: Start server side upload for 1 video file ([[phab:T287957|T287957]])
* 16:45 hashar: Stopping Gerrit for upgrade
* 16:43 volans: upgraded spicerack to 0.0.57-1+deb10u1 on cumin1001
* 16:36 dancy@deploy1002: Finished deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001 (duration: 00m 10s)
* 16:36 dancy@deploy1002: Started deploy [gerrit/gerrit@244120b]: Gerrit to 3.3.5 on gerrit2001
* 16:27 hashar: Going to upgrade Gerrit 3.3 (scheduled maintenance)
* 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 16:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
* 16:00 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
* 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:30 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 15:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 15:25 moritzm: prune testvm2001 from Ganeti and clean up from Netbox (stuck in some inconsistent state the decom cookbook can't handle) [[phab:T286206|T286206]]
* 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2002.codfw.wmnet
* 15:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2002.codfw.wmnet
* 14:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts testvm2001.codfw.wmnet
* 14:49 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
* 14:32 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 14:27 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/*/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:23 ottomata: chown dumpsgen and chmod 644 /data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json  on labstore1006 and labstore1007 (it was only readable by root causing an analytics import job to fail), ping apergos
* 14:13 ottomata: chown dumpsgen and chmod 644 dumpsdata1003:/data/xmldatadumps/public/lezwiki/20210801/dumpstatus.json (it was only readable by root causing an analytics import job to fail), ping apergos
* 12:47 moritzm: restarting Tomcat on idp1001
* 12:05 moritzm: installing libgcrypt20 security updates
* 11:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 11:36 moritzm: updated bullseye d-i images to rc3 [[phab:T275873|T275873]]
* 11:28 godog: upgrade prometheus3001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 11:18 godog: upgrade prometheus5001 to 2.24.1+ds-1+wmf1 - [[phab:T222113|T222113]]
* 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
* 11:13 moritzm: rename Ganeti group for test cluster to row_D [[phab:T286206|T286206]]
* 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host testvm2001.codfw.wmnet
* 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
* 09:18 marostegui: Failover m1, m2 and m3-master  [[phab:T287574|T287574]]
* 09:12 moritzm: installinh php 7.0 security updates on stretch
* 09:11 jayme: importing dragonfly 1.0.6-2 to buster-wikimedia and stretch-wikimedia - [[phab:T286054|T286054]]
* 08:57 moritzm: installing pillow security updates on stretch
* 08:53 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:50 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:17 legoktm: pausing refreshLinks run against wikiversities while other issues are figured out
* 08:13 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:10 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: REIMAGE
* 08:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 08:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
* 07:42 moritzm: upgrading spicerack on cumin2002 to 0.0.57
* 06:31 kart__: Updated cxserver to 2021-08-02-164000-production ([[phab:T286473|T286473]])
* 06:26 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:20 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
* 06:15 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
* 04:37 marostegui: Disable puppet on dbproxy1014 dbproxy1013 dbproxy1020
* 00:43 reedy@deploy1002: Finished deploy [integration/docroot@f9d225d]: with less gref (duration: 00m 05s)
* 00:43 reedy@deploy1002: Started deploy [integration/docroot@f9d225d]: with less gref
* 00:29 reedy@deploy1002: Finished deploy [integration/docroot@f7df1c7]: (no justification provided) (duration: 00m 05s)
* 00:29 reedy@deploy1002: Started deploy [integration/docroot@f7df1c7]: (no justification provided)
* 00:22 reedy@deploy1002: Finished deploy [integration/docroot@3cff0e4]: (no justification provided) (duration: 00m 08s)
* 00:22 reedy@deploy1002: Started deploy [integration/docroot@3cff0e4]: (no justification provided)
 
== 2021-08-02 ==
* 23:58 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
* 23:50 legoktm@cumin1001: START - Cookbook sre.dns.netbox
* 23:38 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:38 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:38 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
* 23:35 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 23:21 legoktm: Previous sync also deployed {{Gerrit|c38998f03f}} "Stop enabling DPL on new wikis" ([[phab:T287380|T287380]])
* 23:18 legoktm@deploy1002: Synchronized dblists/: Move ruwikinews to large wikis dblist (2/2) (duration: 00m 56s)
* 23:16 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Move ruwikinews to large wikis dblist (1/2) (duration: 00m 57s)
* 21:31 tzatziki: removing 1 file for legal compliance
* 21:16 tzatziki: removing 7 files for legal compliance
* 19:35 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 19:15 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:13 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:05 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 19:01 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]])
* 19:00 urbanecm: Morning B&C window completed
* 19:00 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 2/2) (duration: 00m 56s)
* 18:59 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|bebf4a9819f80e19cbb94f115f47c1ff4d05b7d2}}: Enable Growth features on a couple of wikis in dark mode ([[phab:T287868|T287868]], [[phab:T287874|T287874]], [[phab:T287873|T287873]]; 1/2) (duration: 00m 57s)
* 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:50 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Stream config for android_notification_interaction - [[phab:T287652|T287652]] (duration: 00m 56s)
* 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:49 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:49 urbanecm: Run extensions/GrowthExperiments/maintenance/initWikiConfig.php on a couple of wikis to init on-wiki config for Growth features ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]])
* 18:48 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
* 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
* 18:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wikis ([[phab:T287876|T287876]], [[phab:T287871|T287871]], [[phab:T287878|T287878]], [[phab:T287880|T287880]], [[phab:T287875|T287875]], [[phab:T287879|T287879]], [[phab:T287872|T287872]]; 2/2) (duration: 00m 56s)
* 18:45 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: {{Gerrit|18cd360773a2a236f9817ac0a4eaf3790b6d8cff}}: Growth features: Enable features in dark mode on a few wiki